Memory leak on pipeline cancelation
pipecat version
0.0.67
Python version
3.10
Operating System
python3.10-slim
Issue description
In a specific setup, I’m seeing a weird issue that I can reproduce locally—but only ~10% of the time.
Here’s what happens:
The LLM (Gemini Flash 2.0) calls a tool twice.
The STT triggers a message before the API response comes back (the API responds quickly).
After that, Cartesia stops responding entirely.
The UserIdleProcessor eventually sends a message, and the LLM replies correctly—but Cartesia TTS plays nothing.
After 30 seconds, the pipeline is marked idle and canceled (I set this timeout).
From this point, I get no further logs—but memory usage starts climbing until the bot crashes.
Relevant components:
LLM: Gemini Flash 2.0
TTS: Cartesia
Transport: websockets
Audio: audiobuffer for recordings
My pipeline looks like this:
pipeline = Pipeline(
[
transport.input(),
stt_mute_filter,
user_idle,
time_processor.first(),
stt,
transcript.user(),
context_aggregator.user(),
llm,
ffp,
time_processor.reset(),
tts,
ml,
transport.output(),
audiobuffer,
transcript.assistant(),
context_aggregator.assistant(),
]
)
I am not sure where to look, I would appreciate any help.
Reproduction steps
Reproduced locally about 10% of the time for no apparent reason. Difficult to say how to reproduce.
Expected behavior
Cartesia websocket should close and the on_client_disconnected and on_audio_data callback should be called.
Actual behavior
- No more sound from Cartesia
- Bot crashed due to memory leak while canceling the pipeline
Logs
Please see the uploaded screenshot.
I see the Idle pipeline detected log message, which comes from the Idle Pipeline Detection logic, not the UserIdleProcessor (which is a processor in the pipeline that detects no user input).
Are you modifying the Idle detection logic? If not, the timeout is 300 seconds by default and looks for BotSpeakingFrame and LLMFullResponseEndFrame to detect activity.
I can see that the bot tried to speak within 30 seconds of the pipeline timing out. I think you'll have to provide more information.
- Can you share your PipelineTask creation code?
- Have you modified Pipecat at all?
- It looks like you have a number of custom FrameProcessors. Make sure that they all push frames through the pipeline. It looks like you may be inadvertently blocking frames, which would cause the timeout to occur.
Here is the PipelineTask:
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=30,
)
I changed the idle_timeout_secs because it was previously hanging for 5 min when this issue happened.
I am also using the UserIdleProcessor, which triggers the last two Generating chat... you can see in the screenshot -> the LLM responded but it didn't trigger TTS -> That's why the Idle Pipeline Processor is eventually triggered. The user is just waiting for an answer.
Another clue is that it seems to happen only after tool calling. I use pipecat-flows. In the handler function, I added this code:
async def more_info_handler(args: FlowArgs) -> Any:
await stt.queue_frame(
TTSSpeakFrame("WAITING")
)
# ... HTTP call ...
I see in the logs that it appends the "WAITING" message to the LLM context. But after the response from the tool, it did not trigger the LLM API call with the response.
From here I don't have TTS anymore.
Only after the UserIdleProcessor is triggered (8s later) I see a log with Generating chat ...
2025-05-09 14:34:46.777 | DEBUG | pipecat.services.google.llm:_process_context:507 - GoogleLLMService#0: Generating chat [[{'parts': [{'text': 'PROMPT'}], 'role': 'user'}, {'parts': [{'text': "some message"}], 'role': 'model'}, {'parts': [{'text': "Parler à X, s'il vous plaît."}], 'role': 'user'}, {'parts': [{'function_call': {'name': 'recherche_info', 'args': {'query': 'question ?'}, 'id': 'XXX'}}], 'role': 'model'}, {'parts': [{'function_response': {'name': 'recherche_info', 'response': {'value': '{"text": "Je n\'ai pas d\'information sur ce sujet.\\n"}'}, 'id': 'XXX'}}], 'role': 'user'}, {'parts': [{'text': 'WAITING'}], 'role': 'model'}, {'parts': [{'text': "IDLE PROCESSOR MESSAGE"}], 'role': 'user'}]]
In the TRACE logs I have locally it seems that after LLM response, I don't see any CartesiaTTSService#0 appending audio TTSAudioRawFrame. Earlier in the logs (during the function call) I see this (CartesiaTTSService#0: frame processing paused):
2025-05-09 14:34:39.215 | TRACE | pipecat.processors.frame_processor:__internal_push_frame:296 - Pushing TranscriptionUpdateFrame#2(pts: None, messages: 1) from AssistantTranscriptProcessor#0 to GoogleAssistantContextAggregator#0
2025-05-09 14:34:39.215 | TRACE | pipecat.processors.frame_processor:__input_frame_task_handler:343 - CartesiaTTSService#0: frame processing resumed
2025-05-09 14:34:39.215 | TRACE | pipecat.processors.frame_processor:pause_processing_frames:220 - CartesiaTTSService#0: pausing frame processing
2025-05-09 14:34:39.215 | TRACE | pipecat.processors.frame_processor:__input_frame_task_handler:339 - CartesiaTTSService#0: frame processing paused
Can it be linked to the frame processing being pause with Cartesia? (it never resumes later during the call) Or some race condition because of the TTS in the function handler?
Thanks a lot for your answer
The first thing I would do would be to add an Observer to your PipelineTask, which monitors LLMTextFrame, LLMFullResponseStartFrame, and LLMFullResponseEndFrame. The LLMLogObserver() is probably the easiest way to do this. The goal being to see if your custom processors are blocking these frames. Based on what you're seeing, my guess is that your ffp or time_processor.reset() frame processors are blocking frames.
Thanks, I didn't know about that.
I managed to reproduce the memory leak locally with the observer but I'm still not sure how to interpret the results:
2025-05-14 15:53:59.197 | DEBUG | pipecat.observers.loggers.llm_log_observer:on_push_frame:57 - 🧠 GoogleLLMService#0 → LLM START RESPONSE at 44.95s
2025-05-14 15:53:59.481 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - GoogleLLMService#0 TTFB: 0.28527116775512695
2025-05-14 15:53:59.519 | DEBUG | pipecat.observers.loggers.llm_log_observer:on_push_frame:60 - 🧠 GoogleLLMService#0 → LLM GENERATING: 'Je' at 45.27s
2025-05-14 15:53:59.753 | DEBUG | pipecat.observers.loggers.llm_log_observer:on_push_frame:60 - 🧠 GoogleLLMService#0 → LLM GENERATING: ' suis' at 45.51s
2025-05-14 15:53:59.822 | DEBUG | pipecat.observers.loggers.llm_log_observer:on_push_frame:60 - 🧠 GoogleLLMService#0 → LLM GENERATING: " toujours là, je n'ai pas trouvé l'information, je suis toujours" at 45.58s
2025-05-14 15:53:59.828 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:73 - GoogleLLMService#0 prompt tokens: 1664, completion tokens: 29
2025-05-14 15:53:59.828 | DEBUG | pipecat.observers.loggers.llm_log_observer:on_push_frame:60 - 🧠 GoogleLLMService#0 → LLM GENERATING: " en train de chercher les horaires d'ouverture.\n" at 45.58s
2025-05-14 15:53:59.828 | DEBUG | pipecat.observers.loggers.llm_log_observer:on_push_frame:57 - 🧠 GoogleLLMService#0 → LLM END RESPONSE at 45.58s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - PipelineTaskSource#0 → Pipeline#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - PipelineSource#0 → FastAPIWebsocketInputTransport#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - FastAPIWebsocketInputTransport#0 → STTMuteFilter#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - STTMuteFilter#0 → UserIdleProcessor#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - UserIdleProcessor#0 → SetterFirstDateFrameProcessor#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - SetterFirstDateFrameProcessor#0 → AzureSTTService#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.196 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - AzureSTTService#0 → UserTranscriptProcessor#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.197 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - UserTranscriptProcessor#0 → GoogleUserContextAggregator#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.197 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - GoogleUserContextAggregator#0 → GoogleLLMService#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.197 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - GoogleLLMService#0 → FunctionFillerProcessor#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.197 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - FunctionFillerProcessor#0 → ResetFirstDateFrameProcessor#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:07.197 | DEBUG | pipecat.observers.loggers.debug_log_observer:on_push_frame:218 - ResetFirstDateFrameProcessor#0 → CartesiaTTSService#0: EndFrame id: 6054, name: 'EndFrame#0', metadata: {} at 52.95s
2025-05-14 15:54:44.254 | ERROR | pipecat.transports.network.fastapi_websocket:_write_frame:274 - FastAPIWebsocketOutputTransport#0 exception sending data: WebSocketDisconnect ()
INFO: connection closed
2025-05-14 15:54:44.254 | ERROR | pipecat.transports.network.fastapi_websocket:_write_frame:274 - FastAPIWebsocketOutputTransport#0 exception sending data: RuntimeError (Cannot call "send" once a close message has been sent.)
2025-05-14 15:54:44.255 | WARNING | pipecat.pipeline.task:_idle_timeout_detected:579 - Idle pipeline detected, cancelling pipeline task...
2025-05-14 15:54:44.255 | DEBUG | pipecat.pipeline.task:cancel:288 - Canceling pipeline task PipelineTask#0
2025-05-14 15:54:44.255 | ERROR | pipecat.processors.frame_processor:__internal_push_frame:322 - Uncaught exception in PipelineSource#0: Cannot call "send" once a close message has been sent.
The LLM seem to respond correctly but I don't see the LLMTextFrame.
Cartesia again stopped responding.
The UserIdleProcessor finally stop using EndFrame.
The EndFrame stopped in CartesiaTTSService.
The observer is a great feature btw!!
LLMTextFrame is actually LLM GENERATING. Though, in looking at LLMLogObserver, it doesn't show the flow we need. Instead, let's use the DebugLogObserver. You can add this to your PipelineTask:
from pipecat.observers.loggers.debug_log_observer import DebugLogObserver
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
observers=[
DebugLogObserver(
frame_types=(LLMTextFrame, LLMFullResponseStartFrame, LLMFullResponseEndFrame)
),
],
)
This will show the flow of these frames throughout the pipeline. If you can capture these frame logs during the error, we can see what's happening.
I think I have a better understanding now: it seems the TTS stays in a frame processing paused mode after receiving the result from the function. (pretty clear in the logs)
(I changed to INFO logs in the TTS service locally)
2025-05-14 19:23:16.277 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:400 - Bot stopped speaking
2025-05-14 19:23:16.277 | INFO | pipecat.processors.frame_processor:__input_frame_task_handler:355 - CartesiaTTSService#0: frame processing resumed
2025-05-14 19:23:16.277 | INFO | pipecat.processors.frame_processor:__input_frame_task_handler:351 - CartesiaTTSService#0: frame processing paused
From here every frames sent to the TTS are blocked. And it never resumed.
I'm not sure why it does that, but it seems to happen when a tool is called twice and maybe it interferes with the TTSSpeakFrame that is added to the LLM context while waiting:
async def more_info_handler(args: FlowArgs) -> Any:
await stt.queue_frame(
TTSSpeakFrame("SOME MESSAGE")
)
# ... HTTP call ...
I think it wasn't added to the context before, but I'm not sure.
We're seeing a lot more of this issue since upgrading pipecat 0.0.62 -> to 0.0.67; tho we already seen it once or twice in the past.
I've linked the logs as you requested:
Any insight to help me investigate what is happening? It is a huge pain for us in production.
- Do you think it can be a race condition due to the
TTSSpeakFrame? - Any idea how the frame processing happened to be resumed and paused at the exact same time?
- Do you think it can be related to the Google context aggregator somehow? That would explain why this issue is not more widespread across other users?
Any help welcomed. Thanks!
Can you share a little bit more about your code? Specifically, how you set up services, how you set up the context, and how you set up tools?
Also, cc @aconchillo as he may have ideas. I've never had this problem, so I think we need to get a better idea of what your code looks like in case there's anything unexpected.
Yes so it's basically the Twilio chatbot example with few addition:
transport = FastAPIWebsocketTransport(
websocket=websocket_client,
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_out_mixer=soundfile_mixer,
vad_analyzer=SileroVADAnalyzer(),
serializer=TwilioFrameSerializer(stream_sid),
),
)
CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id=voice_id,
model="sonic-2",
params=CartesiaTTSService.InputParams(
language=Language.FR,
speed=-0.3,
),
push_silence_after_stop=False,
)
I am using GoogleLLMService so it's using GoogleContextAggregatorPair.
I have double check every custom processor to make sure they do not block frames (and I don't see them blocking in the logs).
I am using the AudioBufferProcessor:
audiobuffer = AudioBufferProcessor(user_continuous_stream=False)
Also using FlowManager:
flow_manager = FlowManager(
task=task,
llm=llm,
context_aggregator=context_aggregator,
tts=tts,
flow_config=workflow,
)
(Note that the tts used here is the one used to push the TTSSpeakFrame in the workflow).
What do you think of the possible race condition around here (see the logs)
2025-05-14 19:23:16.277 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:400 - Bot stopped speaking
2025-05-14 19:23:16.277 | INFO | pipecat.processors.frame_processor:__input_frame_task_handler:355 - CartesiaTTSService#0: frame processing resumed
2025-05-14 19:23:16.277 | INFO | pipecat.processors.frame_processor:__input_frame_task_handler:351 - CartesiaTTSService#0: frame processing paused
I think it can happen if the TTSService receives both a BotStoppedSpeakingFrame and a TTSSpeakFrame or a LLMFullResponseEndFrame but I would like your input on that.
Please let me know if you need more.
Sorry for the delay. The one key difference is that the Twilio Chatbot example configures Cartesia as:
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="e13cae5c-ec59-4f71-b0a6-266df3c9bb8e", # Madame Mischief
push_silence_after_stop=True,
)
I'm not sure why that's required, but I've asked @aconchillo to learn about that myself.
Thanks, I'll try and see if it persists.
I was thinking: we're using uv to run our python script.
CMD ["uv", "run", "--no-dev", "uvicorn", "workflows.api:app", "--host", "0.0.0.0", "--port", "8765"]
Do you think it can be linked to a strange behaviour with pipecat or do you recommend it at all?
I was thinking: we're using uv to run our python script.
That shouldn't have an impact.
Hi @aurelien-ldp @markbackman do you have more findings regarding this issue?
I have came across this as well where the pipeline appear completely blocked. Our setup is similarly based on twilio-chatbot example + Google LLM + Cartesia TTS and more simplified (no custom processors, etc). FYI I've noticed even the idle handler would not fire.
(edit) - another observation that may help: we have "audio_out_mixer": SoundfileMixer, enabled, while this blocking behavior happens , the background ambient track was still playing.