[Feature]: Parameter based routing

Open Manouchehri opened this issue 1 year ago • 1 comments

The Feature

model_list:
  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: europe-west2
      disallowed_parameters: {"response_format": '{"type": "json_object"}', "n": ">1"}

  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: gemini/gemini-1.5-pro-latest

For example, if this request comes in, route it to Vertex AI.

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-1.5-pro-preview-0409",
    "response_format": {"type": "text"},
    "max_tokens": 8192,
    "messages": [
      {
        "role": "user",
        "content": "tell me a joke in JSON"
      }
    ]
  }'

If this request comes in, route it to Gemini (AI Studio).

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-1.5-pro-preview-0409",
    "response_format": {"type": "json_object"},
    "max_tokens": 8192,
    "messages": [
      {
        "role": "user",
        "content": "tell me a joke in JSON"
      }
    ]
  }'

Motivation, pitch

Right now, Vertex AI (not LiteLLM) is pretty broken when using JSON mode with Gemini 1.5 Pro, it throws 500s on the majority of requests. It would be nice if I could use Gemini (AI Studio) instead of Vertex AI only for the requests that use response_format.

Twitter / LinkedIn details

https://twitter.com/DaveManouchehri

Apr 30 '24 13:04 Manouchehri

that's interesting - why not just have it be a pre-call check, to filter out the deployments which violate the conditions? this way it would work across all routing strategies

https://github.com/BerriAI/litellm/blob/0b0be700fc05bf37c8cb1b4d37e7b19f8578e0c9/litellm/router.py#L2713

We do this today for context window checks - https://docs.litellm.ai/docs/routing#pre-call-checks-context-window

Apr 30 '24 22:04 krrishdholakia