pipecat frequent Audio break (pause) between the words in bots response when multiple calls running simultaneously

pipecat version

3.11

Python version

0.94

Operating System

Ubuntu 20.04.6 LTS

Use Case Description

When multipele calls running over respective websocket connection, user is experiencing bots response audio pauses for a second frequently. Audio pause happens at random palce between the words and resume immediately without loosing any word/sentence

Current Approach

Currenltly we are using OpenAI Azure STT, LLM and TTS services in the pipeline.
Here is the pipeline flow:
pipeline = Pipeline(
            [   
                ws_transport.input(),  # Transport user input
                rtvi, 
                stt,  # STT,
                idle_intent_rag, # RAG  handled in seperate task
                context_aggregator.user(),  
                llm,  # LLM
                custom_frame_processor,
                tts,  # TTS
                ws_transport.output(),  
                context_aggregator.assistant(), 
             ]   
        )   
Please let me know what could be the issue

Errors or Unexpected Behavior

Not seen any errors except that interruption frame log is seen when audio pause happens

Additional Context

No response

Nov 19 '25 04:11 bobysp

Here is the pipeline task:

task = PipelineTask( pipeline, params=PipelineParams( audio_out_encoding=audio_profile["out_encoding"], audio_out_sample_rate_hz=audio_profile["out_sample_rate"], audio_out_channels=1, enable_metrics=True, enable_usage_metrics=True, allow_interruptions=True, interruption_strategies=[MinWordsInterruptionStrategy(min_words=3)] ),
observers=[RTVIObserver(rtvi)], )

Nov 19 '25 04:11 bobysp

Hi Pipecat Team,

This is a BLOCKER for our GA release and if possible can we get on to call..

Regards Ramakrishna

Nov 19 '25 08:11 ranthwar-vcx

@ranthwar-vcx @bobysp Which serializer are you using?

Nov 22 '25 07:11 omensky

@bobysp which transport are you using and in which context? If this is a websocket transport used for web or mobile connection, then you're likely seeing the network impact things. You need a WebRTC transport. (Websockets are fine for server to server connection.)

For others on this thread, there's insufficient information to understand what issues everyone is talking about. I can say with a high degree of confidence that there is no systemic issue in Pipecat with the bot pausing between outputs. The bot outputs audio as fast as it's received back from the TTS service. Check your logs to make sure you're getting good performance out of the TTS service. The best services are:

CartesiaTTSService
ElevenLabsTTSService
RimeTTSService

As they all support:

websocket connection w/ interruptions
context ids for audio generation that's consistent within a given turn
word/timestamp pairs for the best interruption handling

If you can't use those services, look for a websocket service to use so that generations happen over a streaming connection. HTTP based services are subject to slowness between generations based on the service itself. You'll see this in the TTFB values between different generation requests.

Hope this guidance helps everyone sort out their issues!

Nov 23 '25 15:11 markbackman