whisper icon indicating copy to clipboard operation
whisper copied to clipboard

[Examples] Enhance real-time transcription with VAD, word timestamps, and CLI options

Open vasanthrpjan1-boop opened this issue 1 month ago • 0 comments

This PR builds upon #2696 and significantly enhances the real-time transcription example with production-ready features.

New Features

Voice Activity Detection (VAD)

  • Only transcribes when speech is detected, saving compute resources
  • Configurable energy threshold (--energy-threshold)
  • Automatic speech segmentation based on silence detection

Word-Level Timestamps

  • --word-timestamps flag shows timing for each word
  • Useful for subtitling and precise audio alignment

Speaker Change Detection (Experimental)

  • --detect-speakers provides hints when speaker changes are detected
  • Based on pause pattern analysis

Audio Device Selection

  • --list-devices to show available microphones
  • --device-id to select a specific input device

Enhanced User Experience

  • Live audio level visualization with color-coded bar
  • Beautiful terminal UI with box-drawn headers
  • Duplicate transcript filtering using similarity scoring
  • Transcript saving with optional timestamps (--output, --timestamps)

Usage Examples

Basic usage

python examples/real_time_transcription.py

With word timestamps

python examples/real_time_transcription.py --word-timestamps

Save transcript with timestamps

python examples/real_time_transcription.py --output notes.txt --timestamps

Use specific model and language

python examples/real_time_transcription.py --model small --language es## Changes

  • examples/real_time_transcription.py - Complete rewrite with new features
  • README.md - Updated documentation with usage examples

vasanthrpjan1-boop avatar Dec 07 '25 12:12 vasanthrpjan1-boop