frequent Audio break (pause) between the words in bots response when multiple calls running simultaneously
pipecat version
3.11
Python version
0.94
Operating System
Ubuntu 20.04.6 LTS
Use Case Description
When multipele calls running over respective websocket connection, user is experiencing bots response audio pauses for a second frequently. Audio pause happens at random palce between the words and resume immediately without loosing any word/sentence
Current Approach
Currenltly we are using OpenAI Azure STT, LLM and TTS services in the pipeline.
Here is the pipeline flow:
pipeline = Pipeline(
[
ws_transport.input(), # Transport user input
rtvi,
stt, # STT,
idle_intent_rag, # RAG handled in seperate task
context_aggregator.user(),
llm, # LLM
custom_frame_processor,
tts, # TTS
ws_transport.output(),
context_aggregator.assistant(),
]
)
Please let me know what could be the issue
Errors or Unexpected Behavior
Not seen any errors except that interruption frame log is seen when audio pause happens
Additional Context
No response
Here is the pipeline task:
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_out_encoding=audio_profile["out_encoding"],
audio_out_sample_rate_hz=audio_profile["out_sample_rate"],
audio_out_channels=1,
enable_metrics=True,
enable_usage_metrics=True,
allow_interruptions=True,
interruption_strategies=[MinWordsInterruptionStrategy(min_words=3)]
),
observers=[RTVIObserver(rtvi)],
)
Hi Pipecat Team,
This is a BLOCKER for our GA release and if possible can we get on to call..
Regards Ramakrishna
@ranthwar-vcx @bobysp Which serializer are you using?
@bobysp which transport are you using and in which context? If this is a websocket transport used for web or mobile connection, then you're likely seeing the network impact things. You need a WebRTC transport. (Websockets are fine for server to server connection.)
For others on this thread, there's insufficient information to understand what issues everyone is talking about. I can say with a high degree of confidence that there is no systemic issue in Pipecat with the bot pausing between outputs. The bot outputs audio as fast as it's received back from the TTS service. Check your logs to make sure you're getting good performance out of the TTS service. The best services are:
- CartesiaTTSService
- ElevenLabsTTSService
- RimeTTSService
As they all support:
- websocket connection w/ interruptions
- context ids for audio generation that's consistent within a given turn
- word/timestamp pairs for the best interruption handling
If you can't use those services, look for a websocket service to use so that generations happen over a streaming connection. HTTP based services are subject to slowness between generations based on the service itself. You'll see this in the TTFB values between different generation requests.
Hope this guidance helps everyone sort out their issues!