Programmable Media

Video transcription

Last updated: Aug-09-2024

Video transcription enables you to automatically generate an audio transcript from a video file. The resulting file can be used to display a full video transcript alongside your video, added as a text track for standard subtitles, or used for paced subtitles with the Cloudinary Video Player. Transcript generation will identify the language used in the audio and generate the transcript in the correct language.

Use the Cloudinary Video Transcription service to generate your transcripts during upload or trigger generation from the Video Player Studio, and to edit your transcripts using the transcript editor.

Alternatively, you can use an add-on, either Google AI Video Transcription or Microsoft Azure Video Indexer.

Requesting transcription

To request transcription, set the auto_transcription boolean parameter to true as part of your upload request:

Auto transcription is performed asynchronously after your original method call is completed. Thus your original method call response displays a pending status:

When the request is complete (may take several seconds or minutes depending on the length of the video), a new raw file is created in your product environment with the same public ID as your video or audio file and with the .transcript file extension.

If you also provided a notification_url in your method call, the specified URL then receives a notification when the process completes:

Important

The auto transcription parameter is not yet supported by our SDKs. To use this parameter with your SDK Upload API calls, you can define it as part of an upload preset and define the preset alongside your other parameters. To create a new upload preset, open the upload presets page and create/edit a preset ensuring you set the auto transcription parameter.

To use this as part of an upload, add the upload preset as part of your upload call, for example:

Cloudinary transcript files

The created .transcript file includes details of the audio transcription, for example:

Each excerpt of text has a confidence value, and is followed by a breakdown of individual words and their specific start and end times.

Displaying transcripts with the Cloudinary Video Player

You can display your generated transcripts as a text track for subtitles or captions using the Cloudinary Video Player. You can also make use of the advanced information generated to add paced subtitles or word highlighting. To add your transcript, set the textTracks parameter with the relevant configuration.

For transcripts, no URL is required as the player assumes the transcript exists with the same public id as the video. To control the number of words shown for each line of the transcript, use the maxWords parameter, as shown below.

Here's an example:

Note
Only a single transcript text track can be added to any video.

Transcript editor

The transcript editor enables you to trigger generation of transcripts using the transcription service for videos in your Media Library. From here, you can edit the generated transcript to ensure the audio matches exactly with the transcript.

The editor supports adding and editing lines, as well as the individual words within each line.

To open the editor, navigate to the Video Player Studio. Ensure you add your public ID in the Video Details section before selecting the Transcript Editor.

Transcript editor

✔️ Feedback sent!

Rate this page: