Bot Stops speaking mid-sentence
pipecat version
0.0.91
Python version
latest
Operating System
linux,widnows
Use Case Description
Bot Audio Pauses Mid-Sentence Until User Speaks Environment
Pipecat Version: latest (pip install pipecat)
Python Version: 3.11
LLM: OpenAI gpt-4i
TTS: Cartesia
STT: Deepgram Flux
Transport: FastAPI WebSocket (Twilio)
Platform: Ubuntu/Linux
Summary
During voice calls, the assistant speaks the first part of its reply, then goes silent for 15–20 seconds even though more TTS content is ready. As soon as the user says anything (e.g., “Hello?”), the bot instantly resumes and finishes the message.
Expected
When the user is silent, the bot should play all generated TTS chunks continuously. Silence from the caller should not block bot audio.
Actual
Bot starts a multi-sentence reply
Stops after the first sentence
More TTS chunks are generated and logged but never played
Long silence (15–20 s)
Bot resumes immediately once user speech is detected
Example Log (simplified) 13:35:07.306 Bot started speaking 13:35:09.021 Bot stopped speaking # end sentence 1 13:35:10.596 TTS generated (next sentence) 13:35:11.661 TTS generated (more content)
no TTSStartedFrame, bot silent for 17s
13:35:28.999 User started speaking ("Hello?")
bot resumes immediately
Notes
No UserStartedSpeakingFrame during the silence period — caller truly quiet.
TTS responses are generated instantly (TTFB < 0.2 s).
LLM latency normal (1–2 s).
Happens even without any LocalSmartTurnAnalyzerV3; using default VAD and FastAPI transport.
VAD parameters:
confidence=0.7 start_secs=0.2 stop_secs=0.8 min_volume=0.6
Hypothesis
The audio output gate or transport state stays closed after BotStoppedSpeaking, preventing new TTS playback until the next inbound speech resets the state. It seems like a half-duplex gating bug in the Pipecat turn-taking or transport layer.
Request
Can Pipecat confirm whether the output gate is expected to reopen automatically after TTS completion if new TTS frames are ready?
Is this a known issue with the FastAPI/Twilio transport or default turn-taking logic?
Any recommended configuration or patch to ensure continuous bot playback in full-duplex scenarios?
Current Approach
1. **❌ Not LLM Latency**
- LLM generated response quickly (TTFB: 1.8s)
- Prompt tokens: 13,381 (reasonable)
- Processing time: 6.1s (within normal range)
2. **❌ Not TTS Issues**
- TTS generated content promptly
- Multiple TTS chunks were ready
- TTFB consistently fast (0.17-0.22s)
3. **❌ Not Background Noise**
- AI-coustics noise cancellation is active
- No spurious VAD triggers visible in logs
4. **❌ Not User Interruption**
- No `UserStartedSpeakingFrame` during silence period
- User was completely silent for 17+ seconds
Errors or Unexpected Behavior
Bot speech halts mid-sentence even though additional TTS chunks are generated successfully.
During the pause, there are no user speech events and no transport errors — only silence.
No TTSStartedFrame appears for the final generated segment until the caller speaks again.
Once the user says anything (even a single word like “Hello?”), the pending TTS audio immediately begins playback.
The silence period lasts 15–20 seconds and repeats across multiple calls.
Behavior occurs without any SmartTurnAnalyzer — only SileroVADAnalyzer + standard FastAPIWebsocketTransport.
TTS generation latency and LLM latency are both normal, so the stall appears to come from Pipecat’s output gate / transport state not reopening automatically.
Additional Context
- This happens consistently across multiple calls
- Noise cancellation (AI-coustics) is working properly
- User interruptions work correctly when user actually speaks
- The issue is specifically about bot stopping on its own
Hello Pipecat Team,
We are alos facing same issue and Bot stop speaking in mid setence and won't respond until user says something else..
This needs to be fixed..
Regards Ramakrishna
Please provide a repro case, including a concise code example and steps to reproduce.
As far as we know and our testing shows, this is not possible. The bot's output only stops due to:
- Completed turn (nominal case)
- User speaking interruption
- Application code pushing an InterruptionFrame (or InterruptionTaskFrame)
- Service error causing the output to cease
- Transport error causing the connection to the user to break or be disrupted
The occurrence of these things is evident in the logs.