agents
agents copied to clipboard
5-6 seconds delay with telephony agent
Bug Description
I have whisper stt with faster whisper, local llm model qwen3 14b and elevenlabs tts. stt transcibe works fine 120-150 ms for transcribe. llm's first token also very fast 80-120 ms. sometimes full generation needs 2-2.5 seconds. also full audio generation requires about 500-800ms. websocket to elevenlabs is on and all the necessary things are on to stream text to elevenlabs. but it takes 5 second for agent to respond
Expected Behavior
expected behavior is to response in 1-1.5 seconds maximum
Reproduction Steps
tts = elevenlabs.TTS(
voice_id=voice_id,
model=ELEVENLABS_MODEL, # eleven_flash_v2_5 (~75ms)
auto_mode=ELEVENLABS_AUTO_MODE, # False = word-level streaming
chunk_length_schedule=ELEVENLABS_CHUNK_SCHEDULE, # [50] chars
)
Session Configuration with VAD
# VAD configuration for ultra-low latency
vad = silero.VAD.load(
min_speech_duration=0.02, # 20ms - detect speech faster
min_silence_duration=0.2, # 200ms - detect silence faster
padding_duration=0.03, # 30ms - less padding overhead
)
# Agent session with all components
session = AgentSession(
stt=stt,
llm=llm,
tts=tts,
vad=vad,
turn_detection=EnglishModel(),
allow_interruptions=True,
min_interruption_duration=0.15,
min_interruption_words=1,
min_endpointing_delay=0.15, # 150ms - start responding FAST
max_endpointing_delay=3.0,
false_interruption_timeout=0.6,
resume_false_interruption=True,
)
Operating System
debian 13
Models Used
whisper, qwen3 14b, elevenlabs
Package Versions
livekit=1.9.2
livekit-agents=1.2.18
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
No response
Screenshots and Recordings
No response