pipecat Latency between user stopped and bot started speaking

pipecat version

0.0.65

Python version

3.10

Operating System

Ubuntu 22.04

Issue description

I have created a voice bot, but there is a delay of 2 to 5 seconds between when the user stops speaking and when the bot starts responding. After analyzing the logs, I discovered that the bot begins speaking only after the run_tts method is called twice. If the bot starts speaking as soon as the first response from run_tts is generated, it could reduce the delay by approximately 2 to 2.5 seconds.

I'm using Deepgram for both STT and TTS, and OpenAI’s GPT-4o as the language model. The voice bot is integrated with Twilio and deployed on an EC2 instance of type c6i.x2large.

Logs:

Apr 29 11:41:12 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:12.924 | DEBUG | pipecat.transports.base_input:_handle_user_interruption:178 - User stopped speaking

Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:16.890 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:225 - Bot started speaking

Reproduction steps

N/A

Expected behavior

The bot starts speaking as soon as the first response from run_tts is generated.

Actual behavior

I discovered that the bot begins speaking only after the run_tts method is called twice.

Logs

Apr 29 11:41:08 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:08.323 | DEBUG    | pipecat.transports.base_input:_handle_user_interruption:168 - User started speaking
Apr 29 11:41:12 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:12.924 | DEBUG    | pipecat.transports.base_input:_handle_user_interruption:178 - User stopped speaking

 {"role": "user", "content": "let me know how can I file bankruptcy? What is the procedure?"}]]
Apr 29 11:41:13 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:13,888 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Apr 29 11:41:14 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:14.344 | DEBUG    | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#10: Generating TTS [Filing for bankruptcy can be a complex process, but I can certainly provide some general information to help you understand it better.]
Apr 29 11:41:14 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:14,564 - INFO - HTTP Request: POST https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&encoding=linear16&container=none&sample_rate=8000 "HTTP/1.1 200 OK"
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:16.846 | DEBUG    | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#10: Generating TTS [
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 1. **Determine Eligibility:**
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]:    - You'll first need to determine which type of bankruptcy you qualify for, typically Chapter 7 or Chapter 13 for individuals.]
Apr 29 11:41:16 ip-172-31-28-34 python3.10[1549]: 2025-04-29 11:41:16.890 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:225 - Bot started speaking

Apr 29 '25 14:04 Ahmer967

Hey, I just ran the 07c foundational example which has the same services in use and I see 1 second between when I stop speaking and when the bot responds. That time is mostly attributable to the LLM inference time.

2025-04-29 23:23:26.595 | DEBUG    | pipecat.transports.base_input:_handle_user_interruption:220 - User stopped speaking
2025-04-29 23:23:26.595 | DEBUG    | pipecat.services.openai.base_llm:_stream_chat_completions:156 - OpenAILLMService#0: Generating chat [[{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way."}, {"role": "system", "content": "Please introduce yourself to the user."}, {"role": "user", "content": "Yeah. Can you hear me?"}, {"role": "assistant", "content": "Yes, I can hear you clearly.  My name is OpenAI Assistant and I am here to help you with any questions or tasks you have.  How can I assist you today?"}, {"role": "user", "content": "Hey. Tell me a joke."}]]
2025-04-29 23:23:27.335 | DEBUG    | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#0: Generating TTS [Sure.]
2025-04-29 23:23:27.552 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - DeepgramTTSService#0 usage characters: 5
2025-04-29 23:23:27.553 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - DeepgramTTSService#0 processing time: 0.21727800369262695
2025-04-29 23:23:27.553 | DEBUG    | pipecat.services.deepgram.tts:run_tts:53 - DeepgramTTSService#0: Generating TTS [ Why did the scarecrow win an award?]
2025-04-29 23:23:27.569 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:224 - Bot started speaking

Can you please try running the example to see what your experience is?

Also, run_tts runs every time the LLM has output a complete sentence. The idea being that sentences are sufficiently complete to send to the TTS service for audio generation. You'll see multiple sentences sent in succession. Just ask the bot to tell you a story; you'll see these generated as rapidly as the LLM can stream tokens; this is normal and has no direct bearing on the response times.

Response times are directly impacted by the latencies of the third party STT, LLM, and TTS services in the pipeline. This is a function of where you're located in the world, too.

Apr 30 '25 03:04 markbackman

@Ahmer967 are you still blocked? If not, I'll close this issue.

May 06 '25 16:05 markbackman

@markbackman ,I am still facing latency of at least 3–5 seconds. The only difference is that my prompt is large, as the bot has to handle multiple complex tasks. Additionally, I have attached six tools to the bot. I observed that when the user starts speaking, I receive the user_started_speaking log after a delay of 1–2 seconds. I tested the same setup with Vapi, and the delay there is around 2–3 seconds.

May 07 '25 12:05 Ahmer967

You should check the TTFB value in the debug logs. That will help you understand why things are taking so long. My hunch is that your LLM is taking a long time to generate a response. That could be a function of your prompt.

May 07 '25 13:05 markbackman

@Ahmer967 can we close out this issue?

May 16 '25 17:05 markbackman

@markbackman I was working on other project this past week. I'll try the TTFB in this week and will let you know with the update! Then we'll close it.

May 19 '25 11:05 Ahmer967

Did you happen to mitigate the delay issue either by design or by debugging your stack? I also saw large LLM latencies but it was purely due to LLM inference time. My thought was to rest between Gemini and open AI and check if it's any better with a different provider.

Oct 24 '25 03:10 itissid

We also notice OpenAILLMService#0 processing time of 1.7s on pipecat 0.95. Gemini 2.5-flash is slightly faster but just 0.1 second or so and the reasoning seems lower

Nov 23 '25 21:11 BFMarks