pipecat Improve Reliability - Fallback Processors for LLMs

Problem Statement

LLMs can fail or timeout or take too much time (> 5 secs)

Here are some metrics showcasing this behaviour - (These graphs were plotted using ~30,000 metrics data poins from Google AI Studio Endpoint(gemini-2.0-flash-lite), Open AI Endpoint (gpt-4o & gpt-4.1-mini) and Azure OpenAI endpoints.)

In below image you can see p95 latency is somewhere around (700 ms) however at certain times the latency itself can go up to 25 seconds or even more, the LLM endpoints themselves don't have defined SLAs/SLOs

Also, one of the screenshot contains a reported error rate of 0.39% which is not too much but enough to create bad experience for the users

As for unavailability, Google can respond with Response Code 503 as documented in https://ai.google.dev/gemini-api/docs/troubleshooting#error-codes

Whenever such issues happen, the duration of outage/error state is indeterminate.

Proposed Solution

A FallbackLLMService processor, with an ability to -

Take multiple diverse set of LLM Service as inputs
Switch LLMs if the primary fails
Switch LLMs if the primary doesn't respond in user defined timeout (5 second is good for a default)
(Optional) Switch back when the primary is working again (this can be checked everytime we make a LLM call with the secondary, with max_retries in a non-blocking fashion)

Alternative Solutions

No response

Additional Context

No response

Would you be willing to help implement this feature?

[ ] Yes, I'd like to contribute
[ ] No, I'm just suggesting

May 12 '25 06:05 manish-baghel

Hi there, this is actually already achievable with Parallel Pipeline.

May 13 '25 10:05 ken-kuro

Hi there, this is actually already achievable with Parallel Pipeline.

It is definitely achievable using Parallel Pipeline but it would be nice to have special fallback processors for services like LLM. Parallel Pipeline has a computatioin overhead in my opinion and it also makes the code messy as you'd need to implement all sorts of gates in order to fallback

May 13 '25 19:05 manish-baghel

I understand your concern, in fact, I just came across this recently. But after digging deeper in the Pipecat source code and architecture, I realize whether you implement a "fallback processor" yourself, or use something like Function Filter as a logic gate paired with Parallel Pipeline, it's the same, just FrameProcessor underneath, and it will be triggered for all the frame just pass through it.

So I don't think it's necessary to create a dedicated processor just for fallback the LLM, cause it's all just FrameProcessor, and the result wouldn't be that different from the current approach with Parallel Pipeline. However, it's up to the Pipecat team to make the final decision, so if you're having an example of the "fallback processor" you want to have upstream, it would be great if you can put it here.

May 14 '25 02:05 ken-kuro

I do think it would be useful on LLMs (and other AI services) to have a timeout param and have some on timeout and on error handling. AI services inconsistently send ErrorFrame and raise exceptions so it's hard to consistently use either to detect issues without patching the source code.

Proper fallbacks require both determining when to fallback and the logic to hot-swap services. There isn't really a "recommended" or built-in way to do either of these as far as I know

Jun 05 '25 22:06 lucasrothman

We've made progress here:

Most LLMs now have a retry_on_timeout feature that includes a configurable timeout value called retry_timeout_secs. This forces a retry when the LLM is too slow on the first completion.
We've added a ServiceSwitcher class and an LLMSwitcher class, which allow you to run multiple version of the same type of service and switch between services. You can write your own strategy. We'll be adding more built-in failover options in the future.

I'm closing out this issue based on the progress we've made.

Sep 29 '25 15:09 markbackman

pipecat pipecat copied to clipboard

Improve Reliability - Fallback Processors for LLMs

Problem Statement

Proposed Solution

Alternative Solutions

Additional Context

Would you be willing to help implement this feature?

pipecat
pipecat copied to clipboard