agents icon indicating copy to clipboard operation
agents copied to clipboard

STT: ElevenLabs STTv2 (Scribe v2 Realtime) support

Open varghesepaul opened this issue 2 weeks ago • 2 comments

Summary

Adds support for ElevenLabs Scribe v2 Realtime streaming STT with ~150ms latency.

Features

  • WebSocket-based streaming transcription
  • Configurable commit strategies (VAD/manual)
  • Word-level timestamp support
  • Automatic reconnection handling
  • Comprehensive error handling

API Options

  • model_id: Model selection (default: scribe_v2_realtime)
  • language_code: Language support (optional)
  • commit_strategy: "vad" (default) or "manual"
  • include_timestamps: Enable word-level timestamps
  • VAD parameters: threshold, silence duration, speech duration

Implementation Details

  • Follows Deepgram STTv2 pattern for consistency
  • Uses RecognizeStream base class (modern API)
  • Proper usage tracking via RECOGNITION_USAGE events
  • Session override support via update_options()

Known Issues

ElevenLabs API currently returns duplicate transcripts in some scenarios. I've reported this to ElevenLabs (https://github.com/elevenlabs/elevenlabs-python/issues/686). No explicit deduplication logic added as it risks removing valid repeated content.

Documentation

STT - Realtime : https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming , https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime

varghesepaul avatar Nov 16 '25 23:11 varghesepaul