Post-Call Transcription & Summary in React

Introduction

Post-call transcription and summary is a powerful feature provided by VideoSDK that allows users to generate detailed transcriptions and summaries of recorded meetings after they have concluded. This feature is particularly beneficial for capturing and documenting important information discussed during meetings, ensuring that nothing is missed and that there is a comprehensive record of the conversation.

How Post-Call Transcription Works

Post-call transcription involves processing the recorded audio or video content of a meeting to produce a textual representation of the conversation. Here’s a step-by-step breakdown of how it works:

Recording the Meeting: During the meeting, the audio and video are recorded. This can include everything that was said and any shared content, such as presentations or screen shares.
Uploading the Recording: Once the meeting is over, the recorded file is uploaded to the VideoSDK platform. This can be done automatically or manually, depending on the configuration.
Transcription Processing: The uploaded recording is then processed by VideoSDK’s transcription engine. This engine uses advanced speech recognition technology to convert spoken words into written text.
Retrieving the Transcription: After the transcription process is complete, the textual representation of the meeting is made available. This text can be accessed via the VideoSDK API and used in various applications.

Benefits of Post-Call Transcription

Accurate Documentation: Provides a precise record of what was discussed, which is invaluable for meeting minutes, legal documentation, and reference.
Enhanced Accessibility: Makes content accessible to those who may have missed the meeting or have hearing impairments.
Easy Review and Analysis: Enables quick review of key points and decisions made during the meeting without having to re-watch the entire recording.

Let's Get started

VideoSDK empowers you to seamlessly integrate the video calling feature into your React application within minutes.

In this quickstart, you'll explore the group calling feature of VideoSDK. Follow the step-by-step guide to integrate it within your application.

Prerequisites

Before proceeding, ensure that your development environment meets the following requirements:

VideoSDK Developer Account (Not having one, follow VideoSDK Dashboard)
Basic understanding of React
React VideoSDK
Have Node and NPM installed on your device.
Basic understanding of Hooks (useState, useRef, useEffect)
React Context API (optional)
Generate a token from VideoSDK dashboard

Getting Started with the Code!

Follow the steps to create the environment necessary to add video calls into your app. You can also find the code sample for quickstart here.

Create new React App

Create a new React App using the below command.

$ npx create-react-app videosdk-rtc-react-app

Install VideoSDK

Install the VideoSDK using the below-mentioned npm command. Make sure you are in your react app directory before you run this command.

$ npm install "@videosdk.live/react-sdk"

//For the Participants Video
$ npm install "react-player"

Terminal

Structure of the project

Your project structure should look like this.

   root
   ├── node_modules
   ├── public
   ├── src
   │    ├── API.js
   │    ├── App.js
   │    ├── index.js
   .    .

You are going to use functional components to leverage react's reusable component architecture. There will be components for users, videos and controls (mic, camera, leave) over the video.

App Architecture

The App will contain a MeetingView component which includes a ParticipantView component which will render the participant's name, video, audio, etc. It will also have a Controls component which will allow the user to perform operations like leave and toggle media.

You will be working on the following files:

API.js: Responsible for handling API calls such as generating unique meetingId and token
App.js: Responsible for rendering MeetingView and joining the meeting.

Step 1: Get started with API.js

Prior to moving on, you must create an API request to generate a unique meetingId. You will need an authentication token, which you can create either through the videosdk-rtc-api-server-examples or directly from the VideoSDK Dashboard for developers.

//This is the Auth token, you will use it to generate a meeting and connect to it
export const authToken = "<Generated-from-dashbaord>";
// API call to create a meeting
export const createMeeting = async ({ token }) => {
  const res = await fetch(`https://api.videosdk.live/v2/rooms`, {
    method: "POST",
    headers: {
      authorization: `${authToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({}),
  });
  //Destructuring the roomId from the response
  const { roomId } = await res.json();
  return roomId;
};

API.js

Step 2: Wireframe App.js with all the components

To build up wireframe of App.js, you need to use VideoSDK Hooks and Context Providers. VideoSDK provides MeetingProvider, MeetingConsumer, useMeeting and useParticipant hooks.

First you need to understand Context Provider and Consumer. Context is primarily used when some data needs to be accessible by many components at different nesting levels.

MeetingProvider: This is the Context Provider. It accepts value config and token as props. The Provider component accepts a value prop to be passed to consuming components that are descendants of this Provider. One Provider can be connected to many consumers. Providers can be nested to override values deeper within the tree.
MeetingConsumer: This is the Context Consumer. All consumers that are descendants of a Provider will re-render whenever the Provider’s value prop changes.
useMeeting: This is the meeting hook API. It includes all the information related to meeting such as join, leave, enable/disable mic or webcam etc.
useParticipant: This is the participant hook API. It is responsible for handling all the events and props related to one particular participant such as name, webcamStream, micStream etc.

The Meeting Context provides a way to listen for any changes that occur when a participant joins the meeting or makes modifications to their microphone, camera, and other settings.

Begin by making a few changes to the code in the App.js file.

import "./App.css";
import React, { useEffect, useMemo, useRef, useState } from "react";
import {
  MeetingProvider,
  MeetingConsumer,
  useMeeting,
  useParticipant,
} from "@videosdk.live/react-sdk";
import { authToken, createMeeting } from "./API";
import ReactPlayer from "react-player";

function JoinScreen({ getMeetingAndToken }) {
  return null;
}

function ParticipantView(props) {
  return null;
}

function Controls(props) {
  return null;
}

function MeetingView(props) {
  return null;
}

function App() {
  const [meetingId, setMeetingId] = useState(null);

  //Getting the meeting id by calling the api we just wrote
  const getMeetingAndToken = async (id) => {
    const meetingId =
      id == null ? await createMeeting({ token: authToken }) : id;
    setMeetingId(meetingId);
  };

  //This will set Meeting Id to null when meeting is left or ended
  const onMeetingLeave = () => {
    setMeetingId(null);
  };

  return authToken && meetingId ? (
    <MeetingProvider
      config={{
        meetingId,
        micEnabled: true,
        webcamEnabled: true,
        name: "C.V. Raman",
      }}
      token={authToken}
    >
      <MeetingView meetingId={meetingId} onMeetingLeave={onMeetingLeave} />
    </MeetingProvider>
  ) : (
    <JoinScreen getMeetingAndToken={getMeetingAndToken} />
  );
}

export default App;

App.js

Step 3: Implement Join Screen

Join screen will serve as a medium to either schedule a new meeting or join an existing one.

function JoinScreen({ getMeetingAndToken }) {
  const [meetingId, setMeetingId] = useState(null);
  const onClick = async () => {
    await getMeetingAndToken(meetingId);
  };
  return (
    <div>
      <input
        type="text"
        placeholder="Enter Meeting Id"
        onChange={(e) => {
          setMeetingId(e.target.value);
        }}
      />
      <button onClick={onClick}>Join</button>
      {" or "}
      <button onClick={onClick}>Create Meeting</button>
    </div>
  );
}

JoinScreen Component

Output

Step 4: Implement MeetingView and Controls

Next step is to create MeetingView and Controls components to manage features such as join, leave, mute and unmute.

function MeetingView(props) {
  const [joined, setJoined] = useState(null);
  //Get the method which will be used to join the meeting.
  //We will also get the participants list to display all participants
  const { join, participants } = useMeeting({
    //callback for when meeting is joined successfully
    onMeetingJoined: () => {
      setJoined("JOINED");
    },
    //callback for when meeting is left
    onMeetingLeft: () => {
      props.onMeetingLeave();
    },
  });
  const joinMeeting = () => {
    setJoined("JOINING");
    join();
  };

  return (
    <div className="container">
      <h3>Meeting Id: {props.meetingId}</h3>
      {joined && joined == "JOINED" ? (
        <div>
          <Controls />
          //For rendering all the participants in the meeting
          {[...participants.keys()].map((participantId) => (
            <ParticipantView
              participantId={participantId}
              key={participantId}
            />
          ))}
        </div>
      ) : joined && joined == "JOINING" ? (
        <p>Joining the meeting...</p>
      ) : (
        <button onClick={joinMeeting}>Join</button>
      )}
    </div>
  );
}

MeetingView

function Controls() {
  const { leave, toggleMic, toggleWebcam } = useMeeting();
  return (
    <div>
      <button onClick={() => leave()}>Leave</button>
      <button onClick={() => toggleMic()}>toggleMic</button>
      <button onClick={() => toggleWebcam()}>toggleWebcam</button>
    </div>
  );
}

Controls Component

Step 5: Configuring Transcription

In this step, we set up the configuration for post transcription and summary generation. We define the webhook URL where the webhooks will be received.

function Controls() {
  const { leave, toggleMic, toggleWebcam, startRecording, stopRecording } =  useMeeting();

  // Webhook URL where, webhooks are received
  const webhookurl = "https://www.example.com";
  const transcription = {
    enabled: true, // Enables post transcription
    summary: {
      enabled: true, // Enables summary generation

      // Guides summary generation
      prompt:
        "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary",
    },
  };

  return (
    <div>
      <button onClick={() => leave()}>Leave</button>
      <button onClick={() => toggleMic()}>toggleMic</button>
      <button onClick={() => toggleWebcam()}>toggleWebcam</button>
      //Start Post-Call Transcription with Recording
      <button
        onClick={() => startRecording(webhookurl, null, null, transcription)}
      >
        Start Recording
      </button>
      //Stop Recording
      <button onClick={() => stopRecording()}>Stop Recording</button>
    </div>
  );
}

App.js

Output of Controls Component

Step 6: Implement Participant View

Before implementing the participant view, you need to understand a couple of concepts.

1. Forwarding Ref for mic and camera

The useRef hook is responsible for referencing the audio and video components. It will be used to play and stop the audio and video of the participant.

const webcamRef = useRef(null);
const micRef = useRef(null);

Forwarding Ref for mic and camera

2. useParticipant Hook

The useParticipant hook is responsible for handling all the properties and events of one particular participant joined in the meeting. It will take participantId as argument.

const { webcamStream, micStream, webcamOn, micOn } = useParticipant(
  props.participantId
);

useParticipant Hook

3. MediaStream API

The MediaStream API is beneficial for adding a MediaTrack to the audio/video tag, enabling the playback of audio or video.

const webcamRef = useRef(null);
const mediaStream = new MediaStream();
mediaStream.addTrack(webcamStream.track);

webcamRef.current.srcObject = mediaStream;
webcamRef.current
  .play()
  .catch((error) => console.error("videoElem.current.play() failed", error));

MediaStream API

4. Implement ParticipantView

Now you can use both of the hooks and the API to create ParticipantView

function ParticipantView(props) {
  const micRef = useRef(null);
  const { webcamStream, micStream, webcamOn, micOn, isLocal, displayName } =
    useParticipant(props.participantId);

  const videoStream = useMemo(() => {
    if (webcamOn && webcamStream) {
      const mediaStream = new MediaStream();
      mediaStream.addTrack(webcamStream.track);
      return mediaStream;
    }
  }, [webcamStream, webcamOn]);

  useEffect(() => {
    if (micRef.current) {
      if (micOn && micStream) {
        const mediaStream = new MediaStream();
        mediaStream.addTrack(micStream.track);

        micRef.current.srcObject = mediaStream;
        micRef.current
          .play()
          .catch((error) =>
            console.error("videoElem.current.play() failed", error)
          );
      } else {
        micRef.current.srcObject = null;
      }
    }
  }, [micStream, micOn]);

  return (
    <div>
      <p>
        Participant: {displayName} | Webcam: {webcamOn ? "ON" : "OFF"} | Mic:{" "}
        {micOn ? "ON" : "OFF"}
      </p>
      <audio ref={micRef} autoPlay playsInline muted={isLocal} />
      {webcamOn && (
        <ReactPlayer
          //
          playsinline // extremely crucial prop
          pip={false}
          light={false}
          controls={false}
          muted={true}
          playing={true}
          //
          url={videoStream}
          //
          height={"300px"}
          width={"300px"}
          onError={(err) => {
            console.log(err, "participant video error");
          }}
        />
      )}
    </div>
  );
}

ParticipantView

Final Output

You have completed the implementation of a customized video calling app in React.js using VideoSDK. To explore more features, go through Basic and Advanced features.

0:00

Fetching the Transcription from the Dashboard

Once the transcription is ready, you can fetch it from the VideoSDK dashboard. The dashboard provides a user-friendly interface where you can view, download, and manage your transcriptions.