Post-call Transcription & Summary in JavaScript

Introduction

Post-call transcription and summary is a powerful feature provided by VideoSDK that allows users to generate detailed transcriptions and summaries of recorded meetings after they have concluded. This feature is particularly beneficial for capturing and documenting important information discussed during meetings, ensuring that nothing is missed and that there is a comprehensive record of the conversation.

How Post-Call Transcription Works?

Post-call transcription involves processing the recorded audio or video content of a meeting to produce a textual representation of the conversation. Here’s a step-by-step breakdown of how it works:

Recording the Meeting: During the meeting, the audio and video are recorded. This can include everything that was said and any shared content, such as presentations or screen shares.
Uploading the Recording: Once the meeting is over, the recorded file is uploaded to the VideoSDK platform. This can be done automatically or manually, depending on the configuration.
Transcription Processing: The uploaded recording is then processed by VideoSDK’s transcription engine. This engine uses advanced speech recognition technology to convert spoken words into written text.
Retrieving the Transcription: After the transcription process is complete, the textual representation of the meeting is made available. This text can be accessed via the VideoSDK API and used in various applications.

Benefits of Post-Call Transcription

Accurate Documentation: Provides a precise record of what was discussed, which is invaluable for meeting minutes, legal documentation, and reference.
Enhanced Accessibility: Makes content accessible to those who may have missed the meeting or have hearing impairments.
Easy Review and Analysis: Enables quick review of key points and decisions made during the meeting without having to re-watch the entire recording.

Let's Get started

VideoSDK empowers you to seamlessly integrate the video calling feature into your React application within minutes.

In this quickstart, you'll explore the group calling feature of VideoSDK. Follow the step-by-step guide to integrate it within your application.

Prerequisites

Before proceeding, ensure that your development environment meets the following requirements:

VideoSDK Developer Account (Not having one, follow VideoSDK Dashboard)
Basic understanding of JavaScript.
JavaScript VideoSDK
Have Node and NPM installed on your device.
Generate a token from the VideoSDK dashboard

Getting Started with the Code!

Follow the steps to create the environment necessary to add video calls to your app. You can also find the code sample for quickstart here.

First, create one empty project using mkdir folder_name on your preferred location.

Install Video SDK

Import VideoSDK using the <script> tag or Install it using the following npm command. Make sure you are in your app directory before you run this command.

<html>
  <head>
    <!--.....-->
  </head>
  <body>
    <!--.....-->
    <script src="https://sdk.videosdk.live/js-sdk/0.0.89/videosdk.js"></script>
  </body>
</html>

Structure of the project

Your project structure should look like this.

  root
   ├── index.html
   ├── config.js
   ├── index.js

You will be working on the following files:

index.html: Responsible for creating a basic UI.
config.js: Responsible for storing the token.
index.js: Responsible for rendering the meeting view and the join meeting functionality.

Step 1: Design the user interface (UI)

Create an HTML file containing the screens, join-screen and grid-screen.

<!DOCTYPE html>
<html>

<head> </head>

<body>
    <div id="join-screen">
        <!-- Create new Meeting Button -->
        <button id="createMeetingBtn">New Meeting</button>
        OR
        <!-- Join existing Meeting -->
        <input type="text" id="meetingIdTxt" placeholder="Enter Meeting id" />
        <button id="joinBtn">Join Meeting</button>
        <select id="microphone-list">
        </select>
        <select id="speaker-list"></select>
        <select id="camera-list"></select>
    </div>

    <!-- for Managing meeting status -->
    <div id="textDiv"></div>

    <div id="grid-screen" style="display: none">
        <!-- To Display MeetingId -->
        <h3 id="meetingIdHeading"></h3>
        <!-- <br> -->

        <p id="micStatus" style="margin: 5px;">MIC | ON</p>
        <p id="cameraStatus" style="margin: 5px;">CAMERA | ON</p>
        <p id="recordingStatus" style="margin: 5px;">RECORDING | STOPPED</p>

        <!-- Controllers -->
        <button id="leaveBtn">Leave</button>
        <button id="toggleMicBtn">Toggle Mic</button>
        <button id="toggleWebCamBtn">Toggle WebCam</button>
        <button id="startRecording">Start Recording</button>
        <button id="stopRecording">Stop Recording</button>

        <!-- render Video -->
        <div class="row" id="videoContainer"></div>
    </div>
    <script src="https://sdk.videosdk.live/js-sdk/0.0.88/videosdk.js"></script>
    <script src="config.js"></script>
    <script src="index.js"></script>
</body>

</html>

index.html

Output

Step 2: Implement Join Screen

Configure the token in the config.js file, which you can obtain from the VideoSDK Dashbord.

// Auth token will be used to generate a meeting and connect to it
TOKEN = "Your_Token_Here";

config.js

Next, retrieve all the elements from the DOM and declare the following variables in the index.js file. Then, add an event listener to the join and create meeting buttons.

// Getting Elements from DOM
const joinButton = document.getElementById("joinBtn");
const leaveButton = document.getElementById("leaveBtn");
const toggleMicButton = document.getElementById("toggleMicBtn");
const toggleWebCamButton = document.getElementById("toggleWebCamBtn");
const createButton = document.getElementById("createMeetingBtn");
const videoContainer = document.getElementById("videoContainer");
const textDiv = document.getElementById("textDiv");

// Declare Variables
let meeting = null;
let meetingId = "";
let isMicOn = false;
let isWebCamOn = false;

function initializeMeeting() {}

function createLocalParticipant() {}

function createVideoElement() {}

function createAudioElement() {}

function setTrack() {}

// Join Meeting Button Event Listener
joinButton.addEventListener("click", async () => {
  document.getElementById("join-screen").style.display = "none";
  textDiv.textContent = "Joining the meeting...";

  roomId = document.getElementById("meetingIdTxt").value;
  meetingId = roomId;

  initializeMeeting();
});

// Create Meeting Button Event Listener
createButton.addEventListener("click", async () => {
  document.getElementById("join-screen").style.display = "none";
  textDiv.textContent = "Please wait, we are joining the meeting";

  // API call to create meeting
  const url = `https://api.videosdk.live/v2/rooms`;
  const options = {
    method: "POST",
    headers: { Authorization: TOKEN, "Content-Type": "application/json" },
  };

  const { roomId } = await fetch(url, options)
    .then((response) => response.json())
    .catch((error) => alert("error", error));
  meetingId = roomId;

  initializeMeeting();
});

index.js

Step 3: Initialize the meeting

Following that, initialize the meeting using the initMeeting() function and proceed to join the meeting.

// Initialize meeting
function initializeMeeting() {
  window.VideoSDK.config(TOKEN);

  meeting = window.VideoSDK.initMeeting({
    meetingId: meetingId, // required
    name: "Thomas Edison", // required
    micEnabled: true, // optional, default: true
    webcamEnabled: true, // optional, default: true
  });

  meeting.join();

  // Creating local participant
  createLocalParticipant();

  // Setting local participant stream
  meeting.localParticipant.on("stream-enabled", (stream) => {
    setTrack(stream, null, meeting.localParticipant, true);
  });

  // meeting joined event
  meeting.on("meeting-joined", () => {
    textDiv.style.display = "none";
    document.getElementById("grid-screen").style.display = "block";
    document.getElementById(
      "meetingIdHeading"
    ).textContent = `Meeting Id: ${meetingId}`;
  });

  // meeting left event
  meeting.on("meeting-left", () => {
    videoContainer.innerHTML = "";
  });

  // Remote participants Event
  // participant joined
  meeting.on("participant-joined", (participant) => {
    //  ...
  });

  // participant left
  meeting.on("participant-left", (participant) => {
    //  ...
  });
}

index.js

Output

Step 4: Create the Media Elements

In this step, Create a function to generate audio and video elements for displaying both local and remote participants. Set the corresponding media track based on whether it's a video or audio stream.

// creating video element
function createVideoElement(pId, name) {
  let videoFrame = document.createElement("div");
  videoFrame.setAttribute("id", `f-${pId}`);

  //create video
  let videoElement = document.createElement("video");
  videoElement.classList.add("video-frame");
  videoElement.setAttribute("id", `v-${pId}`);
  videoElement.setAttribute("playsinline", true);
  videoElement.setAttribute("width", "300");
  videoFrame.appendChild(videoElement);

  let displayName = document.createElement("div");
  displayName.innerHTML = `Name : ${name}`;

  videoFrame.appendChild(displayName);
  return videoFrame;
}

// creating audio element
function createAudioElement(pId) {
  let audioElement = document.createElement("audio");
  audioElement.setAttribute("autoPlay", "false");
  audioElement.setAttribute("playsInline", "true");
  audioElement.setAttribute("controls", "false");
  audioElement.setAttribute("id", `a-${pId}`);
  audioElement.style.display = "none";
  return audioElement;
}

// creating local participant
function createLocalParticipant() {
  let localParticipant = createVideoElement(
    meeting.localParticipant.id,
    meeting.localParticipant.displayName
  );
  videoContainer.appendChild(localParticipant);
}

// setting media track
function setTrack(stream, audioElement, participant, isLocal) {
  if (stream.kind == "video") {
    isWebCamOn = true;
    const mediaStream = new MediaStream();
    mediaStream.addTrack(stream.track);
    let videoElm = document.getElementById(`v-${participant.id}`);
    videoElm.srcObject = mediaStream;
    videoElm
      .play()
      .catch((error) =>
        console.error("videoElem.current.play() failed", error)
      );
  }
  if (stream.kind == "audio") {
    if (isLocal) {
      isMicOn = true;
    } else {
      const mediaStream = new MediaStream();
      mediaStream.addTrack(stream.track);
      audioElement.srcObject = mediaStream;
      audioElement
        .play()
        .catch((error) => console.error("audioElem.play() failed", error));
    }
  }
}

index.js

Step 5: Handle participant events

Thereafter, implement the events related to the participants and the stream.

The following are the events to be executed in this step:

participant-joined: When a remote participant joins, this event will trigger. In the event callback, create video and audio elements previously defined for rendering their video and audio streams.
participant-left: When a remote participant leaves, this event will trigger. In the event callback, remove the corresponding video and audio elements.
stream-enabled: This event manages the media track of a specific participant by associating it with the appropriate video or audio element.

// Initialize meeting
function initializeMeeting() {
  // ...

  // participant joined
  meeting.on("participant-joined", (participant) => {
    let videoElement = createVideoElement(
      participant.id,
      participant.displayName
    );
    let audioElement = createAudioElement(participant.id);
    // stream-enabled
    participant.on("stream-enabled", (stream) => {
      setTrack(stream, audioElement, participant, false);
    });
    videoContainer.appendChild(videoElement);
    videoContainer.appendChild(audioElement);
  });

  // participants left
  meeting.on("participant-left", (participant) => {
    let vElement = document.getElementById(`f-${participant.id}`);
    vElement.remove(vElement);

    let aElement = document.getElementById(`a-${participant.id}`);
    aElement.remove(aElement);
  });
}

index.js

Output

Step 6: Implement Controls

Next, implement the meeting controls such as toggleMic, toggleWebcam and leave the meeting.

// leave Meeting Button Event Listener
leaveButton.addEventListener("click", async () => {
  meeting?.leave();
  document.getElementById("grid-screen").style.display = "none";
  document.getElementById("join-screen").style.display = "block";
});

// Toggle Mic Button Event Listener
toggleMicButton.addEventListener("click", async () => {
  if (isMicOn) {
    // Disable Mic in Meeting
    meeting?.muteMic();
  } else {
    // Enable Mic in Meeting
    meeting?.unmuteMic();
  }
  isMicOn = !isMicOn;
});

// Toggle Web Cam Button Event Listener
toggleWebCamButton.addEventListener("click", async () => {
  if (isWebCamOn) {
    // Disable Webcam in Meeting
    meeting?.disableWebcam();

    let vElement = document.getElementById(`f-${meeting.localParticipant.id}`);
    vElement.style.display = "none";
  } else {
    // Enable Webcam in Meeting
    meeting?.enableWebcam();

    let vElement = document.getElementById(`f-${meeting.localParticipant.id}`);
    vElement.style.display = "inline";
  }
  isWebCamOn = !isWebCamOn;
});

index.js

Step 7: Configuring Transcription

In this step, we set up the configuration for post-transcription and summary generation. We define the webhook URL where the webhooks will be received.
In the startRecording function, we have passed the transcription object and the webhook URL, which will initiate the post-call transcription process.
Finally, when we call the stopRecording function, both the post-call transcription and the recording will be stopped.

const webhookurl = "example.site";

const transcription = {
  enabled: true, // Enables post transcription
  summary: {
    enabled: true, // Enables summary generation

    // Guides summary generation
    prompt:
      "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary",
  },
};

// Start Recording with Post Transcription
startRecordingButton.addEventListener("click", () => {
  recordingStatus.textContent = "RECORDING | STARTING..."
  meeting.startRecording(webhookurl, null,  null, transcription);
})

// Stop Recording with Post Transcription
stopRecordingButton.addEventListener("click", () => {
  recordingStatus.textContent = "RECORDING | STOPPING..."
  meeting.stopRecording();
})

index.js

Run your code

Once you have completed all the steps mentioned above, run your application using the code block below.

live-server --port=8000

Final Output

You have completed the implementation of a customized video calling app in Javascript using VideoSDK. To explore more features, go through Basic and Advanced features.

Fetching the Transcription from the Dashboard

Once the transcription is ready, you can fetch it from the VideoSDK dashboard. The dashboard provides a user-friendly interface where you can view, download, and manage your Transcriptions & Summary.

Conclusion

Integrating post-call transcription and summary features into your React application using VideoSDK provides significant advantages for capturing and documenting meeting content. This guide has meticulously detailed the steps required to set up and implement these features, ensuring that every conversation during a meeting is accurately transcribed and easily accessible for future reference.

Post-call Transcription & Summary in JavaScript

Video SDK Team

Introduction

How Post-Call Transcription Works?

Benefits of Post-Call Transcription

Let's Get started

Prerequisites

Getting Started with the Code!

Install Video SDK

Structure of the project

Step 1: Design the user interface (UI)

Output

Step 2: Implement Join Screen

Step 3: Initialize the meeting

Output

Step 4: Create the Media Elements

Step 5: Handle participant events

Output

Step 6: Implement Controls

Step 7: Configuring Transcription

Run your code

Final Output

Fetching the Transcription from the Dashboard

Conclusion

Get started for free today