Latency between user stopped and bot started speaking
pipecat version
0.0.65
Python version
3.10
Operating System
Ubuntu 22.04
Issue description
I have created a voice bot, but there is a delay of 2 to 5 seconds between when the user stops speaking and when the bot starts responding. After analyzing the logs, I discovered that the bot begins speaking only after the run_tts method is called twice. If the bot starts speaking as soon as the first response from run_tts is generated, it could reduce the delay by approximately 2 to 2.5 seconds.
I'm using Deepgram for both STT and TTS, and OpenAI’s GPT-4o as the language model. The voice bot is integrated with Twilio and deployed on an EC2 instance of type c6i.x2large.
Logs:
Apr 29 11:41:12 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:12.924 | DEBUG | pipecat.transports.base_input:_handle_user_interruption:178 - User stopped speaking
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:16.890 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:225 - Bot started speaking
Reproduction steps
N/A
Expected behavior
The bot starts speaking as soon as the first response from run_tts is generated.
Actual behavior
I discovered that the bot begins speaking only after the run_tts method is called twice.
Logs
Apr 29 11:41:08 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:08.323 | DEBUG | pipecat.transports.base_input:_handle_user_interruption:168 - User started speaking
Apr 29 11:41:12 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:12.924 | DEBUG | pipecat.transports.base_input:_handle_user_interruption:178 - User stopped speaking
{"role": "user", "content": "let me know how can I file bankruptcy? What is the procedure?"}]]
Apr 29 11:41:13 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:13,888 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Apr 29 11:41:14 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:14.344 | DEBUG | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#10: Generating TTS [Filing for bankruptcy can be a complex process, but I can certainly provide some general information to help you understand it better.]
Apr 29 11:41:14 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:14,564 - INFO - HTTP Request: POST https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&encoding=linear16&container=none&sample_rate=8000 "HTTP/1.1 200 OK"
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:16.846 | DEBUG | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#10: Generating TTS [
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 1. **Determine Eligibility:**
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: - You'll first need to determine which type of bankruptcy you qualify for, typically Chapter 7 or Chapter 13 for individuals.]
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:16.890 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:225 - Bot started speaking
Hey, I just ran the 07c foundational example which has the same services in use and I see 1 second between when I stop speaking and when the bot responds. That time is mostly attributable to the LLM inference time.
2025-04-29 23:23:26.595 | DEBUG | pipecat.transports.base_input:_handle_user_interruption:220 - User stopped speaking
2025-04-29 23:23:26.595 | DEBUG | pipecat.services.openai.base_llm:_stream_chat_completions:156 - OpenAILLMService#0: Generating chat [[{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way."}, {"role": "system", "content": "Please introduce yourself to the user."}, {"role": "user", "content": "Yeah. Can you hear me?"}, {"role": "assistant", "content": "Yes, I can hear you clearly. My name is OpenAI Assistant and I am here to help you with any questions or tasks you have. How can I assist you today?"}, {"role": "user", "content": "Hey. Tell me a joke."}]]
2025-04-29 23:23:27.335 | DEBUG | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#0: Generating TTS [Sure.]
2025-04-29 23:23:27.552 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - DeepgramTTSService#0 usage characters: 5
2025-04-29 23:23:27.553 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - DeepgramTTSService#0 processing time: 0.21727800369262695
2025-04-29 23:23:27.553 | DEBUG | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#0: Generating TTS [ Why did the scarecrow win an award?]
2025-04-29 23:23:27.569 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:224 - Bot started speaking
Can you please try running the example to see what your experience is?
Also, run_tts runs every time the LLM has output a complete sentence. The idea being that sentences are sufficiently complete to send to the TTS service for audio generation. You'll see multiple sentences sent in succession. Just ask the bot to tell you a story; you'll see these generated as rapidly as the LLM can stream tokens; this is normal and has no direct bearing on the response times.
Response times are directly impacted by the latencies of the third party STT, LLM, and TTS services in the pipeline. This is a function of where you're located in the world, too.
@Ahmer967 are you still blocked? If not, I'll close this issue.
@markbackman ,I am still facing latency of at least 3–5 seconds. The only difference is that my prompt is large, as the bot has to handle multiple complex tasks. Additionally, I have attached six tools to the bot. I observed that when the user starts speaking, I receive the user_started_speaking log after a delay of 1–2 seconds. I tested the same setup with Vapi, and the delay there is around 2–3 seconds.
You should check the TTFB value in the debug logs. That will help you understand why things are taking so long. My hunch is that your LLM is taking a long time to generate a response. That could be a function of your prompt.
@Ahmer967 can we close out this issue?
@markbackman I was working on other project this past week. I'll try the TTFB in this week and will let you know with the update! Then we'll close it.
Did you happen to mitigate the delay issue either by design or by debugging your stack? I also saw large LLM latencies but it was purely due to LLM inference time. My thought was to rest between Gemini and open AI and check if it's any better with a different provider.
We also notice OpenAILLMService#0 processing time of 1.7s on pipecat 0.95. Gemini 2.5-flash is slightly faster but just 0.1 second or so and the reasoning seems lower