Whisper API for Transcribing Long-Form Content
How to Bypass the Whisper API Limit and Transcribe Long-Format Content Minutes of reading time remaining By Antonio Blago September 19, 2024
The conversion of video files to audio files can be useful for many reasons: Whether you want to extract the audio track from a lecture, tutorial, or podcast to conveniently listen to it on the go—the OGG format is ideal for efficient, lossy compression.
In addition to file conversion, using the Whisper API is also a helpful method for transcribing audio files. However, there are limitations when using the API, especially if you want to transcribe large amounts of audio (25 MB is the limit). In this blog, you'll learn how to create a simple Bash script to convert videos to OGG files while also applying strategies to bypass the Whisper API limit.
If you need support, just send me an email or schedule an appointment with me.
Why Should You Transcribe Long-Format Content?
Transcribing long-format audio files (such as podcasts, interviews, lectures, webinars, or trainings) offers many advantages. Here are some reasons why it makes sense to convert longer audio content into text:
1. Better Discoverability and SEO Benefits
Search engine optimization (SEO) benefits significantly from transcribed content. Search engines cannot directly analyze the content of audio files, but text can be searched and indexed.
A transcription ensures that the content of podcasts or webinars appears in search engines, increasing reach and contributing to better rankings.
- Increase Accessibility
Not everyone can or wants to listen to audio files—for example, people with hearing impairments. By transcribing content, you make it accessible to a larger audience and promote inclusion.
There are also people who are in environments where they cannot or do not want to listen to audio (e.g., on the train or at work).
- Quick Scanning and Quoting
Long audio formats can be difficult to search through. With a text document, users can quickly read through the content, search for keywords, and directly access specific sections.
Transcriptions also make it easy to extract quotes or important information for use in reports, articles, or presentations.
- Reusing Content
Transcriptions can easily be converted into other formats, such as blog posts, social media posts, or newsletters. This maximizes the reach of the original content and increases the value of your content.
Additionally, you can use shorter text sections from a transcription for various content strategies, such as teasers or summaries.
- Added Value for Your Listeners/Readers
Some people prefer to consume information in text form rather than listening to long audio content. A transcription gives your users the option to consume the content in the way that suits them best.
This increases the user experience and makes it more likely that your content will be shared or recommended more often.
- Expanding International Reach
A transcription makes it easier to translate the content into different languages and thus reach an international audience.
Tools like machine translation (e.g., Google Translate) work much better with text than with audio.
- Education and Research
For studies, research, or training, it is often necessary to have content from lectures or interviews in text form in order to analyze, comment on, or reference it.
Students or researchers can easily read through transcriptions and highlight relevant information for their work.
- Time Savings
Listening through a long audio file can take a lot of time. With a transcription, the content can be captured and understood much more quickly. This is especially helpful in professional contexts where it is important to process information efficiently.
- Documentation
A transcription can serve as a permanent record for meetings, interviews, or discussions. This makes it easier to refer back to important information later or to understand decisions.
For legal or formal purposes, a written version of conversations or interviews may be necessary.
Transcribing long-format audio files offers significant advantages in terms of accessibility, discoverability, and content reuse. It allows you to unlock the full potential of your audio content and make it accessible to a broader audience. Whether it's about SEO, accessibility, or content strategies—transcriptions play a key role in maximizing the value of your content.
What is the OGG Format?
OGG is a free and open container format developed for the efficient storage and transmission of audio data. It is frequently used in the music industry because it offers an excellent balance between file size and sound quality.
What is the Whisper API?
The Whisper API from OpenAI enables you to convert audio files into text. The problem many users encounter is the limitation on requests, which can be especially frustrating when you need to transcribe large amounts of audio files. A common concern is finding ways to bypass these limits and process more content.
How do you convert videos to OGG files?
Prerequisites
ffmpeg: A powerful command-line tool for editing multimedia files.
WSL (Windows Subsystem for Linux) or a Linux/Mac terminal environment.
Whisper API (optional, for subsequent transcription).
1. Installing ffmpeg
If ffmpeg is not installed, you can easily install it in your Unix environment (WSL, Linux, etc.) with the following commands:
sudo apt update
sudo apt install ffmpeg
2. Creating the Bash script for conversion
Here is an example script that converts all .mp4 files in a directory to the .ogg format:
#!/bin/bash
# Directory where the .mp4 files are located
INPUT_DIR="/mnt/c/Users/user/Downloads/"
OUTPUT_DIR="/mnt/c/Users/user/Downloads/"
# Audio settings
AUDIO_CODEC="libopus"
BITRATE="12k"
CHANNELS="2"
APPLICATION="audio"
# Loop through all .mp4 files in the input directory
for INPUT_FILE in "$INPUT_DIR"/*.mp4; do
if [ ! -e "$INPUT_FILE" ]; then
echo "No MP4 files found in directory $INPUT_DIR."
exit 0
fi
# Filename without extension
BASENAME=$(basename "$INPUT_FILE" .mp4)
OUTPUT_FILE="$OUTPUT_DIR/${BASENAME}.ogg"
echo "Converting $INPUT_FILE to $OUTPUT_FILE..."
# Conversion
ffmpeg -i "$INPUT_FILE" \
-vn \
-map_metadata -1 \
-ac "$CHANNELS" \
-c:a "$AUDIO_CODEC" \
-b:a "$BITRATE" \
-application "$APPLICATION" \
"$OUTPUT_FILE"
if [ $? -eq 0 ]; then
echo "Successfully converted: $INPUT_FILE to $OUTPUT_FILE"
else
echo "Error converting $INPUT_FILE"
fi
done
echo "All conversions completed!"
3. Make the script executable and run it
After you have created the script and saved it as, for example, convert_to_ogg.sh, make it executable with the following command:
chmod +x convert_to_ogg.sh
Then run it with:
./convert_to_ogg.sh
The script converts all .mp4 files in the specified directory to the .ogg format and saves the audio files in the same folder.
Whisper API and API Limits – Solutions and Approaches
When using the Whisper API, you may encounter issues if you want to transcribe many audio files, as the number of requests per time unit is limited. Here are some strategies for bypassing the Whisper API limit:
- Batch Processing with Bash
Instead of processing all files at once, you can create a script that processes files in small batches. This reduces the number of API requests per time unit and helps avoid exceeding the limit.
#!/bin/bash
# Number of files per batch
BATCH_SIZE=5
COUNTER=0
# Process audio files
for FILE in "$OUTPUT_DIR"/*.ogg; do
if [ $COUNTER -ge $BATCH_SIZE ]; then
echo "Waiting to bypass API limits..."
sleep 60 # Wait one minute before starting the next batch
COUNTER=0
fi
# Here the Whisper API call for transcription could take place
whisper-cli "$FILE"
COUNTER=$((COUNTER + 1))
done
- Batch Processing with Make
Make.com is a powerful no-code/low-code automation platform (formerly known as Integromat) that allows users to automate workflows by connecting various apps, services, and tools.
Make.com offers a visual interface that lets you create complex automations without programming knowledge. It is an alternative to tools like Zapier and is suitable for automating repetitive tasks in a variety of application areas, from marketing and sales to IT processes.
In the screenshot, the automation process in Make.com is shown, which is used to transcribe and summarize an audio file. This process consists of several steps that are executed sequentially. Here is an explanation of each step:
- Google Drive – Watch Files in a Folder
The process begins with Make.com monitoring a specific directory on Google Drive.
As soon as a new file (in this case, an audio file) is uploaded to the folder, this triggers the process.
- Google Drive – Download a File
After the file is uploaded, it is downloaded from the monitored folder to be processed in the next step.
- OpenAI Whisper – Create a Transcription (Whisper)
The downloaded audio file is then transcribed using the Whisper API from OpenAI.
Whisper is an advanced speech recognition system that converts audio files into text. This step thus creates a text transcription of the audio file.
- OpenAI ChatGPT – Create a Completion (Prompt)
The transcribed text is then passed to ChatGPT to create a summary or further analysis of the content.
Here, a prompt is likely sent to ChatGPT that summarizes the transcribed text or extracts specific information from it.
- Google Drive – Create a File from Text (Transcription)
The generated text (i.e., the transcription) is saved in a new text file and placed in a specified directory on Google Drive.
- Google Drive – Create a File from Text (Summary)
Likewise, the summary created by ChatGPT is saved as a separate file on Google Drive.
Conclusion
This process is useful for automatically transcribing audio files into text and summarizing their contents. All files are centrally stored in Google Drive at the end, making it easier to access and manage the created documents.
If you have questions about further details or want to know how you can adapt this workflow, let me know!
Finally Accessible: Efficiently Filling ALT Texts with AI
5 (1) Practical example in Shopify In my job as an SEO freelancer
Automation, AI, SEO
Finally Accessible: Efficiently Filling ALT Texts with AI
5 (1) Practical example in Shopify In my job as an SEO freelancer [...]
Automation
Create an Apify Account
Tutorial: Create & Start Apify Account Step 1: Go to apify.com Open [...]
Automation, SEO, SEO Tools
Cheaper Alternative to Keyword Planner: Retrieve 130k Keywords with Python
5 (1) The Google Keyword Planner is practical – but with large [...]
Analysis, Automation, SEO, SEO Tools
Tutorial + Template: Backlink Analysis with DataForSEO API Key
5 (1) Are you looking for backlink analyses, keyword data, or comprehensive [...]
Automation, SEO Tools
Status Checker Workflow with n8n and Apify: A Step-by-Step Guide
0 (0) Overview The Website Status Code Crawler from Apify helps [...]
Automation, SEO Tools
Set Up Your Own Cloud Instance with DigitalOcean & Domain in Minutes
Why have your own cloud instance? Your own cloud instance gives you full control over your [...]
Use My SEO Roadmap to Get to Page 1 on Google!
Sign up for my newsletter and get access to free guides, checklists, and tools.