pipecat version

0.0.91

Python version

latest

Operating System

linux,widnows

Use Case Description

Bot Audio Pauses Mid-Sentence Until User Speaks Environment

Pipecat Version: latest (pip install pipecat)

Python Version: 3.11

LLM: OpenAI gpt-4i

TTS: Cartesia

STT: Deepgram Flux

Transport: FastAPI WebSocket (Twilio)

Platform: Ubuntu/Linux

Summary

During voice calls, the assistant speaks the first part of its reply, then goes silent for 15–20 seconds even though more TTS content is ready. As soon as the user says anything (e.g., “Hello?”), the bot instantly resumes and finishes the message.

Expected

When the user is silent, the bot should play all generated TTS chunks continuously. Silence from the caller should not block bot audio.

Actual

Bot starts a multi-sentence reply

Stops after the first sentence

More TTS chunks are generated and logged but never played

Long silence (15–20 s)

Bot resumes immediately once user speech is detected

Example Log (simplified) 13:35:07.306 Bot started speaking 13:35:09.021 Bot stopped speaking # end sentence 1 13:35:10.596 TTS generated (next sentence) 13:35:11.661 TTS generated (more content)

no TTSStartedFrame, bot silent for 17s

13:35:28.999 User started speaking ("Hello?")

bot resumes immediately

Notes

No UserStartedSpeakingFrame during the silence period — caller truly quiet.

TTS responses are generated instantly (TTFB < 0.2 s).

LLM latency normal (1–2 s).

Happens even without any LocalSmartTurnAnalyzerV3; using default VAD and FastAPI transport.

VAD parameters:

confidence=0.7 start_secs=0.2 stop_secs=0.8 min_volume=0.6

Hypothesis

The audio output gate or transport state stays closed after BotStoppedSpeaking, preventing new TTS playback until the next inbound speech resets the state. It seems like a half-duplex gating bug in the Pipecat turn-taking or transport layer.

Request

Can Pipecat confirm whether the output gate is expected to reopen automatically after TTS completion if new TTS frames are ready?

Is this a known issue with the FastAPI/Twilio transport or default turn-taking logic?

Any recommended configuration or patch to ensure continuous bot playback in full-duplex scenarios?

Current Approach

1. **❌ Not LLM Latency**
   - LLM generated response quickly (TTFB: 1.8s)
   - Prompt tokens: 13,381 (reasonable)
   - Processing time: 6.1s (within normal range)

2. **❌ Not TTS Issues**
   - TTS generated content promptly
   - Multiple TTS chunks were ready
   - TTFB consistently fast (0.17-0.22s)

3. **❌ Not Background Noise**
   - AI-coustics noise cancellation is active
   - No spurious VAD triggers visible in logs

4. **❌ Not User Interruption**
   - No `UserStartedSpeakingFrame` during silence period
   - User was completely silent for 17+ seconds

Errors or Unexpected Behavior

Bot speech halts mid-sentence even though additional TTS chunks are generated successfully.

During the pause, there are no user speech events and no transport errors — only silence.

No TTSStartedFrame appears for the final generated segment until the caller speaks again.

Once the user says anything (even a single word like “Hello?”), the pending TTS audio immediately begins playback.

The silence period lasts 15–20 seconds and repeats across multiple calls.

Behavior occurs without any SmartTurnAnalyzer — only SileroVADAnalyzer + standard FastAPIWebsocketTransport.

TTS generation latency and LLM latency are both normal, so the stall appears to come from Pipecat’s output gate / transport state not reopening automatically.

Additional Context

This happens consistently across multiple calls
Noise cancellation (AI-coustics) is working properly
User interruptions work correctly when user actually speaks
The issue is specifically about bot stopping on its own

Oct 30 '25 13:10 Regan17

Hello Pipecat Team,

We are alos facing same issue and Bot stop speaking in mid setence and won't respond until user says something else..

This needs to be fixed..

Regards Ramakrishna

Nov 18 '25 13:11 ranthwar-vcx

Please provide a repro case, including a concise code example and steps to reproduce.

As far as we know and our testing shows, this is not possible. The bot's output only stops due to:

Completed turn (nominal case)
User speaking interruption
Application code pushing an InterruptionFrame (or InterruptionTaskFrame)
Service error causing the output to cease
Transport error causing the connection to the user to break or be disrupted

The occurrence of these things is evident in the logs.

Nov 18 '25 13:11 markbackman

Bot Stops speaking mid-sentence

pipecat version

Python version

Operating System

Use Case Description

no TTSStartedFrame, bot silent for 17s

bot resumes immediately

Current Approach

Errors or Unexpected Behavior

Additional Context