whisper-vtt2srt

A robust, production-grade library designed to convert WebVTT to SRT, turning messy AI transcripts into clean, usable subtitles.

A post-processing tool designed to clean the output from OpenAI Whisper, YouTube Auto-Captions, and other AI transcription services.
Perfect for TTS pipelines, video dubbing, and dataset preparation.

Why whisper-vtt2srt?

Unlike simple regex-based converters, this tool allows for intelligent cleaning strategies specifically engineered to handle the chaotic output of modern AI transcription services like OpenAI Whisper.

Key Features 🚀

🛡️ Stabilization Strategy: Intelligently detects and merges accumulating text blocks ("Karaoke Effect").
🎵 Sound Description Removal: Automatically filters out [Music], [Applause], etc.
🧹 Glitch Filtering: Removes imperceptible <50ms blocks.
✨ Smart Normalization: Strips VTT metadata (align:start, <c>, <b>, <i>) and cleans whitespace.
⚡ Zero Dependencies: Built with pure Python standard library.
🔧 Configurable Strictness: Every cleaning step is optional.

Installation

pip install whisper-vtt2srt

Quick Start

CLI

# Convert a single file
whisper-vtt2srt video.vtt

# Convert a folder recursively
whisper-vtt2srt ./videos --recursive

Python

from whisper_vtt2srt import Pipeline

pipeline = Pipeline()

with open("video.vtt", "r", encoding="utf-8") as f:
    srt_content = pipeline.convert(f.read())