litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Documentation]: Question about LiteLLM Fallback Behavior with Multiple Deployments and Regions

Open yigitkonur opened this issue 1 year ago • 0 comments

What happened?

I'm configuring fallbacks and want to be sure I understand how LiteLLM handles deployments of the same model in different regions.

Here's my setup:

Model A (deployed in EU)  <---(Request starts here)
Model A (deployed in US)
                          |
                          (If Model A fails...)
                          |
                          V
Model B (fallback model) 

My question is: If the EU deployment of Model A fails, what happens?

Scenario 1: Immediate Fallback

Model A (deployed in EU)  <---(Request fails)
Model A (deployed in US)  <---(Skipped?)
                          |
                          (Immediately goes to fallback)
                          |
                          V
Model B (fallback model)  <---(Request sent here)

Scenario 2: Try Other Regions First

Model A (deployed in EU)  <---(Request fails)
Model A (deployed in US)  <---(LiteLLM tries here next!)
                          |
                          (Only if BOTH Model A deployments fail...)
                          |
                          V
Model B (fallback model)  <---(Request sent here as last resort)

Ideally, I'd like Scenario 2: Exhaust all deployments of Model A before moving to Model B. This ensures we've fully utilized the intended model before falling back.

Could you clarify which scenario LiteLLM currently uses? And if it's Scenario 1, is there a way to configure Scenario 2 behavior?

Thanks for helping me build reliable LLM systems!

Relevant log output

No response

Twitter / LinkedIn details

No response

yigitkonur avatar Oct 23 '24 12:10 yigitkonur