whisper

whisper copied to clipboard

Reame
Issues

[Examples] Enhance real-time transcription with VAD, word timestamps, and CLI options

Open vasanthrpjan1-boop opened this issue 1 month ago • 0 comments

This PR builds upon #2696 and significantly enhances the real-time transcription example with production-ready features.

New Features

Voice Activity Detection (VAD)

Only transcribes when speech is detected, saving compute resources
Configurable energy threshold (--energy-threshold)
Automatic speech segmentation based on silence detection

Word-Level Timestamps

--word-timestamps flag shows timing for each word
Useful for subtitling and precise audio alignment

Speaker Change Detection (Experimental)

--detect-speakers provides hints when speaker changes are detected
Based on pause pattern analysis

Audio Device Selection

--list-devices to show available microphones
--device-id to select a specific input device

Enhanced User Experience

Live audio level visualization with color-coded bar
Beautiful terminal UI with box-drawn headers
Duplicate transcript filtering using similarity scoring
Transcript saving with optional timestamps (--output, --timestamps)

Usage Examples

Basic usage

python examples/real_time_transcription.py

With word timestamps

python examples/real_time_transcription.py --word-timestamps

Save transcript with timestamps

python examples/real_time_transcription.py --output notes.txt --timestamps

Use specific model and language

python examples/real_time_transcription.py --model small --language es## Changes

examples/real_time_transcription.py - Complete rewrite with new features
README.md - Updated documentation with usage examples

Dec 07 '25 12:12 vasanthrpjan1-boop