litellm Delete fields in metadata that break Azure OpenAI batch

trafficstars

If model_info and caching_groups are present, then calls to /v1/batches fail with:

Traceback (most recent call last):
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/proxy/batches_endpoints/endpoints.py", line 120, in create_batch
    response = await llm_router.acreate_batch(**_create_batch_data)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2754, in acreate_batch
    raise e
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2742, in acreate_batch
    response = await self.async_function_with_fallbacks(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3248, in async_function_with_fallbacks
    raise original_exception
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3062, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3438, in async_function_with_retries
    raise original_exception
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3331, in async_function_with_retries
    response = await self.make_call(original_function, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3447, in make_call
    response = await response
               ^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2842, in _acreate_batch
    raise e
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2829, in _acreate_batch
    response = await response  # type: ignore
               ^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/utils.py", line 1441, in wrapper_async
    raise e
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/utils.py", line 1300, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/batches/main.py", line 89, in acreate_batch
    raise e
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/batches/main.py", line 83, in acreate_batch
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/litellm/llms/azure/batches/handler.py", line 39, in acreate_batch
    response = await azure_client.batches.create(**create_batch_data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/resources/batches.py", line 309, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1768, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1563, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'code': 'UserError', 'severity': None, 'message': 'Error when parsing request; unable to deserialize request body', 'messageFormat': None, 'messageParameters': None, 'referenceCode': None, 'detailsUri': None, 'target': None, 'details': [], 'innerError': None, 'debugInfo': None, 'additionalInfo': None}, 'correlation': {'operation': '15b57b50944a622dcbe6ef6b1fe3b6da', 'request': '45f83bae4c62fc1e'}, 'environment': 'westus', 'location': 'westus', 'time': '2025-03-13T00:50:02.6689421+00:00', 'componentName': 'managed-batch-inference', 'statusCode': 400}
INFO:     127.0.0.1:57972 - "POST /v1/batches HTTP/1.1" 500 Internal Server Error

Possibly related to https://github.com/BerriAI/litellm/issues/5396; at least in the sense that I was trying to do through stuff in that issue and couldn't because this problem was preventing me.

This might not be the best way to fix it, but at least it shows the problem we're having and what seems to ultimately cause it.

Cc: @taralika, @krrishdholakia

Mar 13 '25 02:03 msabramo

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2025 2:10am

Mar 13 '25 02:03 vercel[bot]

hey @msabramo how are you triggering this?

cc: @ishaan-jaff - might be good qa tests for responses api as well

Mar 13 '25 02:03 krrishdholakia

@krrishdholakia: Triggering this with:

$ cat file-68dbd7ff214c4f53b0059b6ff59838ed.jsonl
{"custom_id": "task-0", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-2024-11-20-batch-no-filter", "messages": [{"role": "user", "content": "what llm are you"}]}}

$ jq < file-68dbd7ff214c4f53b0059b6ff59838ed.jsonl
{
  "custom_id": "task-0",
  "method": "POST",
  "url": "/chat/completions",
  "body": {
    "model": "gpt-4o-2024-11-20-batch-no-filter",
    "messages": [
      {
        "role": "user",
        "content": "what llm are you"
      }
    ]
  }
}

$ curl -i -sSL 'http://localhost:4000/v1/files' \
    -H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
    -F purpose="batch" \
    -F file="@file-68dbd7ff214c4f53b0059b6ff59838ed.jsonl"
HTTP/1.1 200 OK
date: Thu, 13 Mar 2025 00:16:33 GMT
server: uvicorn
content-length: 210
content-type: application/json
x-litellm-version: 1.63.7
x-litellm-key-spend: 0.0

{"id":"file-f2676ac5b0554102b8a3a60ab1c47218","bytes":188,"created_at":1741824996,"filename":"modified_file.jsonl","object":"file","purpose":"batch","status":"processed","expires_at":null,"status_details":null}

curl -i -sSL 'http://localhost:4000/v1/batches' \
    -H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
    -H "Content-Type: application/json" \
    -d '{
        "input_file_id": "file-f2676ac5b0554102b8a3a60ab1c47218",
        "endpoint": "/v1/chat/completions",
        "completion_window": "24h",
        "model": "gpt-4o-2024-11-20-batch-no-filter"
    }'

Here's something else really weird and quite likely another bug. With the change in this PR the call above succeeds:

$ curl -i -sSL 'http://localhost:4000/v1/batches' \
    -H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
    -H "Content-Type: application/json" \
    -d '{
        "input_file_id": "file-f2676ac5b0554102b8a3a60ab1c47218",
        "endpoint": "/v1/chat/completions",
        "completion_window": "24h",
        "model": "gpt-4o-2024-11-20-batch-no-filter"
    }'
HTTP/1.1 200 OK
date: Thu, 13 Mar 2025 00:53:09 GMT
server: uvicorn
content-length: 699
content-type: application/json
x-litellm-model-id: edfc4f8b120c218e12b7e50f1cba1506ec39c3b8b4db1c0b0c1bab02bb31fc1f
x-litellm-model-api-base: https://us-w-0008.openai.azure.com
x-litellm-version: 1.63.7
x-litellm-key-spend: 0.0

{"id":"batch_990fe845-4ff9-49d8-a3da-a5dcedcc8526","completion_window":"24h","created_at":1741827194,"endpoint":"/chat/completions","input_file_id":"file-f2676ac5b0554102b8a3a60ab1c47218","object":"batch","status":"validating","cancelled_at":null,"cancelling_at":null,"completed_at":null,"error_file_id":"","errors":null,"expired_at":null,"expires_at":1741913593,"failed_at":null,"finalizing_at":null,"in_progress_at":null,"metadata":{"model_group":"gpt-4o-2024-11-20-batch-no-filter","model_group_size":"1","deployment":"azure/gpt-4o-2024-11-20-batch-no-filter","api_base":"https://us-w-0008.openai.azure.com"},"output_file_id":"","request_counts":{"completed":0,"failed":0,"total":0},"usage":null}

but then when I try to get the newly created batch, it fails with a strange error about bedrock, which is super weird because we're using Azure OpenAI:

$ curl -sSL 'http://localhost:4000/v1/batches/batch_990fe845-4ff9-49d8-a3da-a5dcedcc8526' \
    -H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
    -H "Content-Type: application/json" \
    | jq '.'
{
  "error": {
    "message": "Internal Server Error, litellm.BadRequestError: LiteLLM doesn't support bedrock for 'create_batch'. Only 'openai' is supported.",
    "type": "internal_server_error",
    "param": null,
    "code": "500"
  }
}

Mar 13 '25 02:03 msabramo

hmm i have a similar script and don't hit it - what's in your config.yaml ?

Mar 13 '25 02:03 krrishdholakia

Relevent config.yaml snippets:

model_list:
  - model_name: gpt-4o-2024-11-20-batch-no-filter
    litellm_params:
      model: azure/gpt-4o-2024-11-20-batch-no-filter
      api_base: os.environ/AOAI_BASE_US_W_0008
      api_key: os.environ/AOAI_KEY_US_W_0008
      api_version: 2024-12-01-preview
    model_info:
      mode: batch

files_settings:
  - custom_llm_provider: azure
    api_base: os.environ/AOAI_BASE_US_E_0008
    api_key: os.environ/AOAI_KEY_US_E_0008
    api_version: 2024-12-01-preview

litellm_settings:
  turn_off_message_logging: True
  drop_params: True
  enable_loadbalancing_on_batch_endpoints: true

Mar 13 '25 02:03 msabramo

IIRC, I got some other error if I didn't have enable_loadbalancing_on_batch_endpoints set to true.

Mar 13 '25 02:03 msabramo

IIRC, I got some other error if I didn't have enable_loadbalancing_on_batch_endpoints set to true.

Just checked. If I remove enable_loadbalancing_on_batch_endpoints, then all operations seem to fail with:

{
  "error": {
    "message": "Error code: 500 - {'error': {'message': 'The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable', 'type': 'None', 'param': 'None', 'code': '500'}}",
    "type": "None",
    "param": "None",
    "code": "500"
  }
}

Mar 13 '25 02:03 msabramo

How does routing of batch requests to Azure OpenAI deployments work without and with enable_loadbalancing_on_batch_endpoints?

Mar 13 '25 14:03 msabramo

All committers have signed the CLA.

Apr 22 '25 22:04 CLAassistant

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Jul 24 '25 00:07 github-actions[bot]

litellm litellm copied to clipboard

Delete fields in metadata that break Azure OpenAI batch

litellm
litellm copied to clipboard