Tutorial 2 of 6
Processing Videos
Once Verso is installed, you can process any video file to extract vocabulary. Here is how the process works from start to finish.
1. Open a video file
Click Process Video on the home screen and select a video file. Verso supports MP4, MKV, AVI, MOV, and most other common formats. The file stays on your computer — nothing is uploaded.
2. Select the language and model
Choose the spoken language in the video and the Whisper model to use. If you are unsure about the language, Verso can detect it automatically. Use a larger model if the base model produces too many errors for your language.
3. Wait for transcription
Click Start. A progress bar shows how far along the transcription is. The time depends on the video length and your hardware — a 20-minute episode typically takes 2 to 5 minutes on a modern CPU. If you have a CUDA GPU, it will be used automatically and can be 5–10× faster.
4. Verso fingerprints the video
While transcribing, Verso computes a fingerprint of the video file. This fingerprint lets your phone identify the same video later so it can play clips during review — even if the file is in a different folder.
5. Review the curation screen
When transcription finishes, Verso shows a list of vocabulary words found in the video. The next tutorial explains how to choose which words to add to your study list.