litellm
litellm copied to clipboard
Delete fields in metadata that break Azure OpenAI batch
If model_info and caching_groups are present, then calls to /v1/batches fail with:
Traceback (most recent call last):
File "/Users/abramowi/Code/OpenSource/litellm/litellm/proxy/batches_endpoints/endpoints.py", line 120, in create_batch
response = await llm_router.acreate_batch(**_create_batch_data) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2754, in acreate_batch
raise e
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2742, in acreate_batch
response = await self.async_function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3248, in async_function_with_fallbacks
raise original_exception
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3062, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3438, in async_function_with_retries
raise original_exception
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3331, in async_function_with_retries
response = await self.make_call(original_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 3447, in make_call
response = await response
^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2842, in _acreate_batch
raise e
File "/Users/abramowi/Code/OpenSource/litellm/litellm/router.py", line 2829, in _acreate_batch
response = await response # type: ignore
^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/utils.py", line 1441, in wrapper_async
raise e
File "/Users/abramowi/Code/OpenSource/litellm/litellm/utils.py", line 1300, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/batches/main.py", line 89, in acreate_batch
raise e
File "/Users/abramowi/Code/OpenSource/litellm/litellm/batches/main.py", line 83, in acreate_batch
response = await init_response
^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/litellm/llms/azure/batches/handler.py", line 39, in acreate_batch
response = await azure_client.batches.create(**create_batch_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/resources/batches.py", line 309, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1768, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1461, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Code/OpenSource/litellm/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1563, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'code': 'UserError', 'severity': None, 'message': 'Error when parsing request; unable to deserialize request body', 'messageFormat': None, 'messageParameters': None, 'referenceCode': None, 'detailsUri': None, 'target': None, 'details': [], 'innerError': None, 'debugInfo': None, 'additionalInfo': None}, 'correlation': {'operation': '15b57b50944a622dcbe6ef6b1fe3b6da', 'request': '45f83bae4c62fc1e'}, 'environment': 'westus', 'location': 'westus', 'time': '2025-03-13T00:50:02.6689421+00:00', 'componentName': 'managed-batch-inference', 'statusCode': 400}
INFO: 127.0.0.1:57972 - "POST /v1/batches HTTP/1.1" 500 Internal Server Error
Possibly related to https://github.com/BerriAI/litellm/issues/5396; at least in the sense that I was trying to do through stuff in that issue and couldn't because this problem was preventing me.
This might not be the best way to fix it, but at least it shows the problem we're having and what seems to ultimately cause it.
Cc: @taralika, @krrishdholakia
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| litellm | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 13, 2025 2:10am |
hey @msabramo how are you triggering this?
cc: @ishaan-jaff - might be good qa tests for responses api as well
@krrishdholakia: Triggering this with:
$ cat file-68dbd7ff214c4f53b0059b6ff59838ed.jsonl
{"custom_id": "task-0", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-2024-11-20-batch-no-filter", "messages": [{"role": "user", "content": "what llm are you"}]}}
$ jq < file-68dbd7ff214c4f53b0059b6ff59838ed.jsonl
{
"custom_id": "task-0",
"method": "POST",
"url": "/chat/completions",
"body": {
"model": "gpt-4o-2024-11-20-batch-no-filter",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
}
$ curl -i -sSL 'http://localhost:4000/v1/files' \
-H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
-F purpose="batch" \
-F file="@file-68dbd7ff214c4f53b0059b6ff59838ed.jsonl"
HTTP/1.1 200 OK
date: Thu, 13 Mar 2025 00:16:33 GMT
server: uvicorn
content-length: 210
content-type: application/json
x-litellm-version: 1.63.7
x-litellm-key-spend: 0.0
{"id":"file-f2676ac5b0554102b8a3a60ab1c47218","bytes":188,"created_at":1741824996,"filename":"modified_file.jsonl","object":"file","purpose":"batch","status":"processed","expires_at":null,"status_details":null}
curl -i -sSL 'http://localhost:4000/v1/batches' \
-H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-f2676ac5b0554102b8a3a60ab1c47218",
"endpoint": "/v1/chat/completions",
"completion_window": "24h",
"model": "gpt-4o-2024-11-20-batch-no-filter"
}'
Here's something else really weird and quite likely another bug. With the change in this PR the call above succeeds:
$ curl -i -sSL 'http://localhost:4000/v1/batches' \
-H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-f2676ac5b0554102b8a3a60ab1c47218",
"endpoint": "/v1/chat/completions",
"completion_window": "24h",
"model": "gpt-4o-2024-11-20-batch-no-filter"
}'
HTTP/1.1 200 OK
date: Thu, 13 Mar 2025 00:53:09 GMT
server: uvicorn
content-length: 699
content-type: application/json
x-litellm-model-id: edfc4f8b120c218e12b7e50f1cba1506ec39c3b8b4db1c0b0c1bab02bb31fc1f
x-litellm-model-api-base: https://us-w-0008.openai.azure.com
x-litellm-version: 1.63.7
x-litellm-key-spend: 0.0
{"id":"batch_990fe845-4ff9-49d8-a3da-a5dcedcc8526","completion_window":"24h","created_at":1741827194,"endpoint":"/chat/completions","input_file_id":"file-f2676ac5b0554102b8a3a60ab1c47218","object":"batch","status":"validating","cancelled_at":null,"cancelling_at":null,"completed_at":null,"error_file_id":"","errors":null,"expired_at":null,"expires_at":1741913593,"failed_at":null,"finalizing_at":null,"in_progress_at":null,"metadata":{"model_group":"gpt-4o-2024-11-20-batch-no-filter","model_group_size":"1","deployment":"azure/gpt-4o-2024-11-20-batch-no-filter","api_base":"https://us-w-0008.openai.azure.com"},"output_file_id":"","request_counts":{"completed":0,"failed":0,"total":0},"usage":null}
but then when I try to get the newly created batch, it fails with a strange error about bedrock, which is super weird because we're using Azure OpenAI:
$ curl -sSL 'http://localhost:4000/v1/batches/batch_990fe845-4ff9-49d8-a3da-a5dcedcc8526' \
-H "Authorization: Bearer ${LITELLM_MASTER_KEY}" \
-H "Content-Type: application/json" \
| jq '.'
{
"error": {
"message": "Internal Server Error, litellm.BadRequestError: LiteLLM doesn't support bedrock for 'create_batch'. Only 'openai' is supported.",
"type": "internal_server_error",
"param": null,
"code": "500"
}
}
hmm i have a similar script and don't hit it - what's in your config.yaml ?
Relevent config.yaml snippets:
model_list:
- model_name: gpt-4o-2024-11-20-batch-no-filter
litellm_params:
model: azure/gpt-4o-2024-11-20-batch-no-filter
api_base: os.environ/AOAI_BASE_US_W_0008
api_key: os.environ/AOAI_KEY_US_W_0008
api_version: 2024-12-01-preview
model_info:
mode: batch
files_settings:
- custom_llm_provider: azure
api_base: os.environ/AOAI_BASE_US_E_0008
api_key: os.environ/AOAI_KEY_US_E_0008
api_version: 2024-12-01-preview
litellm_settings:
turn_off_message_logging: True
drop_params: True
enable_loadbalancing_on_batch_endpoints: true
IIRC, I got some other error if I didn't have enable_loadbalancing_on_batch_endpoints set to true.
IIRC, I got some other error if I didn't have
enable_loadbalancing_on_batch_endpointsset to true.
Just checked. If I remove enable_loadbalancing_on_batch_endpoints, then all operations seem to fail with:
{
"error": {
"message": "Error code: 500 - {'error': {'message': 'The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable', 'type': 'None', 'param': 'None', 'code': '500'}}",
"type": "None",
"param": "None",
"code": "500"
}
}
How does routing of batch requests to Azure OpenAI deployments work without and with enable_loadbalancing_on_batch_endpoints?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.