agents
agents copied to clipboard
STT: ElevenLabs STTv2 (Scribe v2 Realtime) support
Summary
Adds support for ElevenLabs Scribe v2 Realtime streaming STT with ~150ms latency.
Features
- WebSocket-based streaming transcription
- Configurable commit strategies (VAD/manual)
- Word-level timestamp support
- Automatic reconnection handling
- Comprehensive error handling
API Options
model_id: Model selection (default: scribe_v2_realtime)language_code: Language support (optional)commit_strategy: "vad" (default) or "manual"include_timestamps: Enable word-level timestamps- VAD parameters: threshold, silence duration, speech duration
Implementation Details
- Follows Deepgram STTv2 pattern for consistency
- Uses RecognizeStream base class (modern API)
- Proper usage tracking via RECOGNITION_USAGE events
- Session override support via update_options()
Known Issues
ElevenLabs API currently returns duplicate transcripts in some scenarios. I've reported this to ElevenLabs (https://github.com/elevenlabs/elevenlabs-python/issues/686). No explicit deduplication logic added as it risks removing valid repeated content.
Documentation
STT - Realtime : https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming , https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime