📽️ Transcribe video lessons
Starting from mp4 video, extract audio and transcribe the audio to text files using Google APIs
Intro
In this tutorial, we will see how to get the audio transcriptions (text files) from a batch of mp4 videos.
The aim is to help students to get the transcript from teachers online courses, using one of the best black box ML technique Google Speech to text API.
Note: maybe a better solution for speech to text exist for English, but the example here is from Italian lessons.
Extract audio from the mp4
We assume that in the video only the teacher speaks, so we will extract a mono channel.
# Get a wav file for each mp4 file found on current directory:
~/Downloads/video/wav
❯ for FILE in *.mp4; do ffmpeg -i $FILE -acodec pcm_s16le -ac 1 -ar 16000 "${FILE%.*}".wav ; done
- Tip: normalize the file names with
~/Downloads/video/wav ❯ detox .
Initialize Google cloud platform (GCP)
- Get the 300$ from the free tier link
- Create a bucket (here named “example-sbobinate”)
- Enable the speech to text API
Upload wav to gcs
~/Downloads/video/wav
❯ gsutil -m cp * gs://example-sbobinate/test/
- Tip: Slow upload? be sure the bucket location is near your region
Use the speech to text API
- Log into GCP account
~/Downloads/video/wav
❯ gcloud init
- Call the API and store the transcriptions
# File: `api_call.sh`
# Require gsutil, gcloud, jq
mkdir -p transcriptions
for FILE_PATH in $(gsutil ls "gs://example-sbobinate/test/"); do
echo "Submit file $FILE_PATH"
RUN_ID=$(gcloud ml speech recognize-long-running "$FILE_PATH" --language-code=it-IT --async | jq -r .name)
echo "Run id: $RUN_ID"
FILENAME=${FILE_PATH##*/}
OUTPUT="./transcriptions/""${FILENAME%.*}".json
echo "OUTPUT: $OUTPUT"
gcloud ml speech operations wait $RUN_ID >"$OUTPUT"
echo "-------------"
done
Parse and store the transcriptions
- Parse all the json received from Google API speech
# File: `results_parser.sh`
mkdir -p ./transcriptions/only_text/
for FILE in ./transcriptions/*; do
echo "Start working on $FILE..."
FILENAME=${FILE##*/}
OUTPUT="./transcriptions/only_text/""${FILENAME%.*}".txt
echo "OUTPUT: $OUTPUT"
echo "" >$OUTPUT # create the file
RESULTS=$(cat "$FILE" | jq .results) # get the transcriptions
for row in $(echo "${RESULTS}" | jq -r '.[] | @base64'); do
TRANSCRIPTION=$(echo ${row} | base64 --decode | jq -r ${1} | jq '.[]|first' | jq .transcript) # Isolate only the text of the 1st alternative
echo $TRANSCRIPTION >>"$OUTPUT"
done
done
Check the results
- Check the video transcriptions under
./trascriptions/only_text/
References
- Google recognize-long-running API