Subtitle Generator
Subtitle Generator This project provides a web application that automatically generates subtitles (in SRT or VTT format) from audio files. It leverages the power of OpenAI’s Whisper speech recognition model for transcription and pyannote.audio for speaker diarization, all wrapped in a user-friendly FastAPI web interface.
Features Web Interface: A clean, simple HTML interface for uploading audio files. No complex command-line usage is required. Multiple Audio Formats: Supports .wav, .mp3, .m4a, and .flac files. Automatic conversion to WAV is handled internally. Speaker Diarization: Identifies different speakers in the audio and includes speaker labels in the generated subtitles. Subtitle Formats: Generates subtitles in either .srt (SubRip) or .vtt (WebVTT) format, selectable via the web interface. Fast and Efficient: Utilizes a dynamically selected Whisper model (from tiny to large) based on your system’s available RAM and VRAM (GPU memory), optimizing for performance and accuracy. Easy Deployment: Can be run directly with Uvicorn or easily deployed using Docker and Docker Compose. Well-Defined API: A single /upload-audio/ endpoint handles file uploads and subtitle generation, with clear request and response formats. File Size Limit: The maximum upload file is limited to 50MB. Requirements Before you get started, make sure you have the following: