agents icon indicating copy to clipboard operation
agents copied to clipboard

5-6 seconds delay with telephony agent

Open cyber-goka opened this issue 2 weeks ago • 2 comments

Bug Description

I have whisper stt with faster whisper, local llm model qwen3 14b and elevenlabs tts. stt transcibe works fine 120-150 ms for transcribe. llm's first token also very fast 80-120 ms. sometimes full generation needs 2-2.5 seconds. also full audio generation requires about 500-800ms. websocket to elevenlabs is on and all the necessary things are on to stream text to elevenlabs. but it takes 5 second for agent to respond

Expected Behavior

expected behavior is to response in 1-1.5 seconds maximum

Reproduction Steps

tts = elevenlabs.TTS(
      voice_id=voice_id,
      model=ELEVENLABS_MODEL,              # eleven_flash_v2_5 (~75ms)
      auto_mode=ELEVENLABS_AUTO_MODE,      # False = word-level streaming
      chunk_length_schedule=ELEVENLABS_CHUNK_SCHEDULE,  # [50] chars
  )

  Session Configuration with VAD

  # VAD configuration for ultra-low latency
  vad = silero.VAD.load(
      min_speech_duration=0.02,  # 20ms - detect speech faster
      min_silence_duration=0.2,  # 200ms - detect silence faster
      padding_duration=0.03,     # 30ms - less padding overhead
  )

  # Agent session with all components
  session = AgentSession(
      stt=stt,
      llm=llm,
      tts=tts,
      vad=vad,
      turn_detection=EnglishModel(),
      allow_interruptions=True,
      min_interruption_duration=0.15,
      min_interruption_words=1,
      min_endpointing_delay=0.15,  # 150ms - start responding FAST
      max_endpointing_delay=3.0,
      false_interruption_timeout=0.6,
      resume_false_interruption=True,
  )

Operating System

debian 13

Models Used

whisper, qwen3 14b, elevenlabs

Package Versions

livekit=1.9.2
livekit-agents=1.2.18

Session/Room/Call IDs

No response

Proposed Solution


Additional Context

No response

Screenshots and Recordings

No response

cyber-goka avatar Nov 19 '25 01:11 cyber-goka