whisper
whisper copied to clipboard
[Examples] Enhance real-time transcription with VAD, word timestamps, and CLI options
This PR builds upon #2696 and significantly enhances the real-time transcription example with production-ready features.
New Features
Voice Activity Detection (VAD)
- Only transcribes when speech is detected, saving compute resources
- Configurable energy threshold (
--energy-threshold) - Automatic speech segmentation based on silence detection
Word-Level Timestamps
-
--word-timestampsflag shows timing for each word - Useful for subtitling and precise audio alignment
Speaker Change Detection (Experimental)
-
--detect-speakersprovides hints when speaker changes are detected - Based on pause pattern analysis
Audio Device Selection
-
--list-devicesto show available microphones -
--device-idto select a specific input device
Enhanced User Experience
- Live audio level visualization with color-coded bar
- Beautiful terminal UI with box-drawn headers
- Duplicate transcript filtering using similarity scoring
- Transcript saving with optional timestamps (
--output,--timestamps)
Usage Examples
Basic usage
python examples/real_time_transcription.py
With word timestamps
python examples/real_time_transcription.py --word-timestamps
Save transcript with timestamps
python examples/real_time_transcription.py --output notes.txt --timestamps
Use specific model and language
python examples/real_time_transcription.py --model small --language es## Changes
-
examples/real_time_transcription.py- Complete rewrite with new features -
README.md- Updated documentation with usage examples