Skip to content

Pipeline API

whisper_vtt2srt.use_cases.pipeline.Pipeline

Orchestrates the full VTT to SRT conversion process.

This class combines the parser, the cleaning filters, and the writer into a single flow. It is the main entry point for in-memory conversion.

Attributes:

Name Type Description
options CleaningOptions

Configuration for the cleaning filters.

parser VttParser

The component that reads VTT text.

writer SrtWriter

The component that builds SRT text.

filters List[ContentFilter]

A list of filters to apply sequentially.

Source code in whisper_vtt2srt/use_cases/pipeline.py
class Pipeline:
    """Orchestrates the full VTT to SRT conversion process.

    This class combines the parser, the cleaning filters, and the writer into a
    single flow. It is the main entry point for in-memory conversion.

    Attributes:
        options (CleaningOptions): Configuration for the cleaning filters.
        parser (VttParser): The component that reads VTT text.
        writer (SrtWriter): The component that builds SRT text.
        filters (List[ContentFilter]): A list of filters to apply sequentially.
    """

    def __init__(self, options: Optional[CleaningOptions] = None):
        self.options = options or CleaningOptions()
        self.parser = VttParser()
        self.writer = SrtWriter()

        # Register filters
        self.filters = [
            SoundDescriptionFilter(),  # Clean sound descriptions first
            ContentNormalizer(),
            GlitchFilter(),
            KaraokeDeduplicator(),
            ShortLineMerger()
        ]

    def convert(self, content: str) -> str:
        """Converts raw VTT string content into formatted SRT string content.

        Args:
            content: The raw text content of a WebVTT file.

        Returns:
            str: The processed content formatted as SubRip (SRT).
        """
        # 1. Parse
        blocks = list(self.parser.parse(content))

        # 2. Clean/Filter
        for filter_ in self.filters:
            blocks = filter_.apply(blocks, self.options)

        # 3. Write
        return self.writer.write(blocks)

convert(content)

Converts raw VTT string content into formatted SRT string content.

Parameters:

Name Type Description Default
content str

The raw text content of a WebVTT file.

required

Returns:

Name Type Description
str str

The processed content formatted as SubRip (SRT).

Source code in whisper_vtt2srt/use_cases/pipeline.py
def convert(self, content: str) -> str:
    """Converts raw VTT string content into formatted SRT string content.

    Args:
        content: The raw text content of a WebVTT file.

    Returns:
        str: The processed content formatted as SubRip (SRT).
    """
    # 1. Parse
    blocks = list(self.parser.parse(content))

    # 2. Clean/Filter
    for filter_ in self.filters:
        blocks = filter_.apply(blocks, self.options)

    # 3. Write
    return self.writer.write(blocks)