Pipeline overview
- Drop a video — MP4, MKV, AVI, MOV, WebM, WMV, FLV, or M4V.
- Speech recognition runs on-device with phrase-level segmentation tuned for broadcast subtitles.
- The on-device AI translates each line with awareness of character voice and continuity.
- A second AI pass re-reads the entire subtitle file for coherence — the "review pass".
- Word-level diff highlighting visualizes what the AI corrected.
- Quality report: coverage %, average reading speed, gap detection, fast / slow line counters.
Five quality presets
From Fast (max throughput) to Maximum Quality (slowest but most accurate), with Balanced as the recommended default and a dedicated Soft Speech preset that catches whispered or quiet voices. A custom mode exposes every parameter for power users.
Multi-language from one transcription
Transcribe once, translate to many. The "+ Add Language" button retranslates the cached transcription to a new target language in seconds, generating per-language SRT files without re-running the speech recognition pass.
Hardware-accelerated hard-burn
Burn translated subtitles into the video using your GPU's video encoder — NVIDIA, AMD, or Intel. Auto-detected. Or soft-mux for zero quality loss when your player supports it. Font, size, color, and outline are all configurable.
Multi-video queue
Drop a folder, processes every video sequentially. Skip-on-exists prevents re-running already-finished videos. Per-project cache stores the transcription plus completed languages — resume after restart without re-running the speech recognition pass.