litellm
litellm copied to clipboard
[Documentation]: Question about LiteLLM Fallback Behavior with Multiple Deployments and Regions
What happened?
I'm configuring fallbacks and want to be sure I understand how LiteLLM handles deployments of the same model in different regions.
Here's my setup:
Model A (deployed in EU) <---(Request starts here)
Model A (deployed in US)
|
(If Model A fails...)
|
V
Model B (fallback model)
My question is: If the EU deployment of Model A fails, what happens?
Scenario 1: Immediate Fallback
Model A (deployed in EU) <---(Request fails)
Model A (deployed in US) <---(Skipped?)
|
(Immediately goes to fallback)
|
V
Model B (fallback model) <---(Request sent here)
Scenario 2: Try Other Regions First
Model A (deployed in EU) <---(Request fails)
Model A (deployed in US) <---(LiteLLM tries here next!)
|
(Only if BOTH Model A deployments fail...)
|
V
Model B (fallback model) <---(Request sent here as last resort)
Ideally, I'd like Scenario 2: Exhaust all deployments of Model A before moving to Model B. This ensures we've fully utilized the intended model before falling back.
Could you clarify which scenario LiteLLM currently uses? And if it's Scenario 1, is there a way to configure Scenario 2 behavior?
Thanks for helping me build reliable LLM systems!
Relevant log output
No response
Twitter / LinkedIn details
No response