pipecat run function calls sequentially or in parallel

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

Function calls can now be executed sequentially by passing run_in_parallel=False when creating the LLM service. Each one is still executed in a separate task to not block the pipeline.
Function calls are now cancelled by default with interruptions. I figured this is the most common use case. Usually you want the bot to just tell you the result of the function call. So, if the user interrupts you want that to stop. The user can catch asyncio.CancelledError to catch the cancellation.
Function calls are now passed all at once to the base class with LLMService.run_function_calls(). Before, we used LLMService.call_function() for each function call. Having a single function with a list of all the function calls we are going to execute gives us more control to add logic. For example, we now only check which is the last function call in one place instead of each LLM implementation.

Apr 28 '25 20:04 aconchillo

Codecov Report

:x: Patch coverage is 31.46853% with 98 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat/services/llm_service.py	28.23%	61 Missing :warning:
src/pipecat/processors/aggregators/llm_response.py	53.84%	6 Missing :warning:
...rc/pipecat/services/openai_realtime_beta/openai.py	0.00%	6 Missing :warning:
src/pipecat/services/google/llm_openai.py	16.66%	5 Missing :warning:
src/pipecat/services/openai/base_llm.py	16.66%	5 Missing :warning:
src/pipecat/services/anthropic/llm.py	20.00%	4 Missing :warning:
src/pipecat/services/aws/llm.py	0.00%	3 Missing :warning:
.../pipecat/services/gemini_multimodal_live/gemini.py	0.00%	3 Missing :warning:
src/pipecat/services/google/llm.py	25.00%	3 Missing :warning:
src/pipecat/pipeline/runner.py	0.00%	2 Missing :warning:

Files with missing lines	Coverage Δ
src/pipecat/frames/frames.py	`95.14% <100.00%> (+0.12%)`	:arrow_up:
src/pipecat/pipeline/runner.py	`72.91% <0.00%> (ø)`
src/pipecat/services/aws/llm.py	`24.24% <0.00%> (-0.13%)`	:arrow_down:
.../pipecat/services/gemini_multimodal_live/gemini.py	`0.00% <0.00%> (ø)`
src/pipecat/services/google/llm.py	`29.72% <25.00%> (-0.21%)`	:arrow_down:
src/pipecat/services/anthropic/llm.py	`28.86% <20.00%> (-0.26%)`	:arrow_down:
src/pipecat/services/google/llm_openai.py	`22.22% <16.66%> (+0.34%)`	:arrow_up:
src/pipecat/services/openai/base_llm.py	`27.04% <16.66%> (-0.96%)`	:arrow_down:
src/pipecat/processors/aggregators/llm_response.py	`75.00% <53.84%> (-0.97%)`	:arrow_down:
...rc/pipecat/services/openai_realtime_beta/openai.py	`0.00% <0.00%> (ø)`
... and 1 more

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Apr 28 '25 20:04 codecov[bot]

Just to confirm before I start reviewing it.

Based on the last meeting, this PR will be updated so that we can define through settings whether all function calls should run sequentially or in parallel, with the default behavior being parallel. Is that correct ?

That is correct. This is now available. You can set run_in_parallel=False to run sequentially when you create the LLM service.

May 21 '25 06:05 aconchillo

This looks good to me, pending Mark's one-character change; is there anything else I can help test?

If you have time and you want to play with it a bit that would be nice. Adding more than one function call, run then sequentially or in parallel, enabling/disabling cancellation and cancelling them, etc.

May 21 '25 17:05 aconchillo

When testing this PR locally, I still experience repeated responses when running a script with parallel function calls: 14-function-calling-parallel.py.zip (see also Discord context)

IIUC it is supposed to wait until all parallel functions complete and then generate a single response.

https://github.com/user-attachments/assets/55eda329-d4ca-4dd2-9200-c14fc8b1082c

May 21 '25 20:05 mattrossman

I've tested a few different scenarios:

OpenAI

When run_in_parallel=True, I see:

Function call 1
LLM chat completion 1
Function call 2
LLM chat completion 2

When run_in_parallel=False, I see the expected behavior:

Function call 1
Function call 2
LLM chat completion (for both 1 and 2)

Anthropic

I see the opposite from OpenAI

When run_in_parallel=True, I see:

Function call 1
Function call 2
LLM chat completion (for both 1 and 2)

When run_in_parallel=False, I see the expected behavior:

Function call 1
LLM chat completion 1
Function call 2
LLM chat completion 2

Gemini

Same result as OpenAI (run_in_parallel=False works as expected, =True does not)

OpenAI Realtime

There's an error:

2025-05-21 18:02:37.972 | ERROR    | pipecat.utils.asyncio:run_coroutine:113 - OpenAIRealtimeBetaLLMService#0::_receive_task_handler: unexpected exception: 'ConversationItem' object has no attribute 'tool_id'

Suggesting changes below for the error. With that fix, I see both options working as expected.

Gemini Live

Works as expected.

May 21 '25 21:05 markbackman

This is fixed now. The issue was that if there are two or more function calls to be executed in parallel, the first one might get executed right away without the rest being registered by the assistant aggregator yet. So we were running a new completion when the last function call executed, and in this case the first one was considered to be the last one.

So, I have added a new frame FunctionCallsStartedFrame which tells everyone about the list of function calls that are going to be executed. This way we can pre-register function calls and we know when there are no function calls left to be executed.

I have also added an event on_function_calls_started in case people want to do something.

May 30 '25 17:05 aconchillo

Btw, Claude tries to avoid executing function calls in parallel so it's harder to try (see https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use#parallel-tool-use). That's why you were getting:

Function call 1
LLM chat completion 1
Function call 2
LLM chat completion 2

If the LLM returns 2 or more function calls you should always get (parallel or sequential):

Function call 1
Function call 2
LLM chat completion 1,2

(if they are parallel it could be 2,1)

May 30 '25 17:05 aconchillo

Confirming this example behaves as expected now 👍 The on_function_calls_started event is helpful, thanks for adding that.

https://github.com/user-attachments/assets/c5d5562d-abe8-4509-80d9-56055d940acd

May 30 '25 18:05 mattrossman

This is looking much better! I'm now seeing OpenAI, Anthropic, Gemini, OpenAI Realtime, and Gemini Live all behave expectedly.

Two things I noticed:

Running 14e results in Gemini saying "Let me check on that" once before the functions run (expected) and once again after they run. I don't see this for OpenAI.

Fixed. The issue is that run_function_calls was called with an empty list and the on_function_calls_started was called. But we just need to not run anything if not function calls is returned.

Running 14r (AWSBedrockLLMService) has an error when running a function call:

Fixed. This was because this is a newer LLM service that at the time I started this PR was not there yet.

May 31 '25 21:05 aconchillo