[Feature]: Parameter based routing
The Feature
model_list:
- model_name: gemini-1.5-pro-preview-0409
litellm_params:
model: vertex_ai/gemini-1.5-pro-preview-0409
vertex_project: litellm-epic
vertex_location: europe-west2
disallowed_parameters: {"response_format": '{"type": "json_object"}', "n": ">1"}
- model_name: gemini-1.5-pro-preview-0409
litellm_params:
model: gemini/gemini-1.5-pro-latest
For example, if this request comes in, route it to Vertex AI.
curl -v "${OPENAI_API_BASE}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gemini-1.5-pro-preview-0409",
"response_format": {"type": "text"},
"max_tokens": 8192,
"messages": [
{
"role": "user",
"content": "tell me a joke in JSON"
}
]
}'
If this request comes in, route it to Gemini (AI Studio).
curl -v "${OPENAI_API_BASE}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gemini-1.5-pro-preview-0409",
"response_format": {"type": "json_object"},
"max_tokens": 8192,
"messages": [
{
"role": "user",
"content": "tell me a joke in JSON"
}
]
}'
Motivation, pitch
Right now, Vertex AI (not LiteLLM) is pretty broken when using JSON mode with Gemini 1.5 Pro, it throws 500s on the majority of requests. It would be nice if I could use Gemini (AI Studio) instead of Vertex AI only for the requests that use response_format.
Twitter / LinkedIn details
https://twitter.com/DaveManouchehri
that's interesting - why not just have it be a pre-call check, to filter out the deployments which violate the conditions? this way it would work across all routing strategies
https://github.com/BerriAI/litellm/blob/0b0be700fc05bf37c8cb1b4d37e7b19f8578e0c9/litellm/router.py#L2713
We do this today for context window checks - https://docs.litellm.ai/docs/routing#pre-call-checks-context-window