agents STT: ElevenLabs STTv2 (Scribe v2 Realtime) support

STT: ElevenLabs STTv2 (Scribe v2 Realtime) support

Open varghesepaul opened this issue 2 weeks ago • 2 comments

Summary

Adds support for ElevenLabs Scribe v2 Realtime streaming STT with ~150ms latency.

Features

WebSocket-based streaming transcription
Configurable commit strategies (VAD/manual)
Word-level timestamp support
Automatic reconnection handling
Comprehensive error handling

API Options

model_id: Model selection (default: scribe_v2_realtime)
language_code: Language support (optional)
commit_strategy: "vad" (default) or "manual"
include_timestamps: Enable word-level timestamps
VAD parameters: threshold, silence duration, speech duration

Implementation Details

Follows Deepgram STTv2 pattern for consistency
Uses RecognizeStream base class (modern API)
Proper usage tracking via RECOGNITION_USAGE events
Session override support via update_options()

Known Issues

ElevenLabs API currently returns duplicate transcripts in some scenarios. I've reported this to ElevenLabs (https://github.com/elevenlabs/elevenlabs-python/issues/686). No explicit deduplication logic added as it risks removing valid repeated content.

Documentation

STT - Realtime : https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming , https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime

Nov 16 '25 23:11 varghesepaul

agents agents copied to clipboard

STT: ElevenLabs STTv2 (Scribe v2 Realtime) support

Summary

Features

API Options

Implementation Details

Known Issues

Documentation

agents
agents copied to clipboard