whisper-vtt2srt
A robust, production-grade library designed to convert WebVTT to SRT, turning messy AI transcripts into clean, usable subtitles.
A post-processing tool designed to clean the output from OpenAI Whisper, YouTube Auto-Captions, and other AI transcription services.
Perfect for TTS pipelines, video dubbing, and dataset preparation.
Why whisper-vtt2srt?
Unlike simple regex-based converters, this tool allows for intelligent cleaning strategies specifically engineered to handle the chaotic output of modern AI transcription services like OpenAI Whisper.
Key Features ๐
- ๐ก๏ธ Stabilization Strategy: Intelligently detects and merges accumulating text blocks ("Karaoke Effect").
- ๐ต Sound Description Removal: Automatically filters out
[Music],[Applause], etc. - ๐งน Glitch Filtering: Removes imperceptible <50ms blocks.
- โจ Smart Normalization: Strips VTT metadata (
align:start,<c>,<b>,<i>) and cleans whitespace. - โก Zero Dependencies: Built with pure Python standard library.
- ๐ง Configurable Strictness: Every cleaning step is optional.
Installation
Quick Start
CLI
# Convert a single file
whisper-vtt2srt video.vtt
# Convert a folder recursively
whisper-vtt2srt ./videos --recursive