litellm
litellm copied to clipboard
Error: Sagemaker error Too little data for declared Content-Length
I am trying to invoke a Qwen model hosted via SM Endopoint using the 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 LMI container and getting this error:
completion(model="sagemaker/<endpoint>, messages=[{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, {"role": "user", "content": prompt}], max_tokens=1000)
DEBUG:LiteLLM:
POST Request Sent from LiteLLM:
curl -X POST \
https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/djl-inference-2025-02-14-10-08-10-995/invocations \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEAMaCXVzLWVhc3QtMSJIMEYCIQCf8O/rjgsIGA8AAd32YqP1hPAps8Lv1MO7wGmkbkSBugIhAN9rLXBe5b58Jyu382pwwU8ZIc9KjjkPPQ1h9W7/f9PKKqsDCCwQBBoMNjk5MzkxMDE5Njk4Igy18IhpcEdZbHMpCmwqiAMKjouElJPlgKX1Vp0rXuL+CotC78GxabBqQq4t2NP5cCLzXMMkr4Crf4/ALdVETud5xALtCyh0bXIQbeCtVlVxXaMxliaFWy1bz/UMqP2HPKb2dPKZ6D9C3HdPs9Y/XKG0fPnFCBSRzVBX2V49K9g897zakSw9RCkPw+w7MZgCPoIrbl0FOGI3B3+xC3fYXdDUOOWqHLsEDJXLDVnqRqxV9pVqD8Mi3Iw7VW/0iqUxRsmIuRXMtbspH3FYImowNfetC+99E1ReYqkaN0478ZByZTRGBTO9VTPhGJ9uDwPAmLVLWv/rKBHtlVQa8Ut5MWdytq+2oFWNnyszfIZ/XBhwsMPxQocUlFlKYPhkfu6sWdsgNZcv8BlzWrkgnC/CpBZVuOl4cj9lM+/GbH3KItlJDcyslPbYlPmdwRZfy+97pyUeG0WEmdeioGgLl0/YPVjFZyJtkJ6UUUW1Qtwd7OYWBbdby/yFB5ShXzc0eP5DqZQ3K0zXRmTYeSsiyDbuImtUpsHcrttmEjCnvry9BjqdAa5rMK1a62o/f+7ACESMrm7soTxCnQgHs3kdsyqqNTa5jOyhLupvgdg+ovrpERaD+ASXwg3cYNxyos0hiki5ktd9+mha5S46OXHh+RjxPmjPGN7qIqFAq3y3yFyDRjE4J1C7UrhT+zhf5uMQPtvXspKkRJ8ZMw1TjCA9G8VB********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIA2FVXUBKZITRFXD7B/20250214/us-east-1/sagemaker/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=2956d2f01968c5a3571e********************************************' -H 'Content-Length: *****' \
-d '{'parameters': {'max_new_tokens': 1000}, 'inputs': 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant what is 2+2'}'
10:57:44 - LiteLLM:DEBUG: utils.py:301 - RAW RESPONSE:
Too little data for declared Content-Length
DEBUG:LiteLLM:RAW RESPONSE:
Too little data for declared Content-Length
10:57:44 - LiteLLM:ERROR: handler.py:330 - Sagemaker error Too little data for declared Content-Length
Also, why is litellm using the generate payload (which does not apply the chat template from the tokenizer, instead of the chat/completion payload?
For example this works:
r = sm.invoke_endpoint(
EndpointName = "<endpoint>",
Body=json.dumps({"messages":[{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, {"role": "user", "content": prompt}], "max_tokens":2048, "stop":["<|im_end|>"]}),
ContentType='application/json'
)
resp = json.loads(r.get("Body").read().decode("utf-8"))
print(resp['choices'][0]['message']['content'])
Same issue
Same issue here..
same issue here
same issue here
Any fix available for this issue?
Also, why is litellm using the generate payload (which does not apply the chat template from the tokenizer, instead of the chat/completion payload?
Hey @massi-ang you can specify the chat format using sagemaker_chat/ - https://docs.litellm.ai/docs/providers/aws_sagemaker#sagemaker-messages-api
Testing this - it works
Please confirm you're on the latest version of litellm
I retried the same with LiteLLM 1.63.11 and the error seems to be gone.
I'm still encountering the same error even with the latest version of litellm
10:31:07 - LiteLLM:INFO: utils.py:2999 -
LiteLLM completion() model= deepseek-r1-distill-llama-8b-091017; provider = sagemaker
10:31:07 - LiteLLM:DEBUG: utils.py:3002 -
LiteLLM: Params passed to completion() {'model': 'deepseek-r1-distill-llama-8b-091017', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'sagemaker', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'reasoning_effort': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'Hello, how are you?'}], 'thinking': None}
10:31:07 - LiteLLM:DEBUG: utils.py:3005 -
... ...
21:28:33 - LiteLLM:DEBUG: get_api_base.py:63 - Error occurred in getting api base - litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=jumpstart-dft-deepseek-llm-r1-disti-20250316-091017
Pass model as E.g. For 'Huggingface' inference endpoints pass in completion(model='huggingface/starcoder',..) Learn more: https://docs.litellm.ai/docs/providers
21:28:33 - LiteLLM:DEBUG: exception_mapping_utils.py:2243 - Logging Details: logger_fn - None | callable(logger_fn) - False
21:28:33 - LiteLLM:DEBUG: litellm_logging.py:1932 - Logging Details LiteLLM-Failure Call: []
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.
Same here, still encountered the error even with all the latest packages:
% pip show h11 httpx litellm botocore
Name: h11
Version: 0.14.0
Summary: A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
Home-page: https://github.com/python-hyper/h11
Author: Nathaniel J. Smith
Author-email: [email protected]
License: MIT
Location: /chat-completion/lib/python3.11/site-packages
Requires:
Required-by: httpcore, uvicorn
---
Name: httpx
Version: 0.28.1
Summary: The next generation HTTP client.
Home-page:
Author:
Author-email: Tom Christie <[email protected]>
License: BSD-3-Clause
Location: /chat-completion/lib/python3.11/site-packages
Requires: anyio, certifi, httpcore, idna
Required-by: litellm, openai
---
Name: litellm
Version: 1.63.11
Summary: Library to easily interface with LLM API providers
Home-page:
Author: BerriAI
Author-email:
License: MIT
Location: /chat-completion/lib/python3.11/site-packages
Requires: aiohttp, click, httpx, importlib-metadata, jinja2, jsonschema, openai, pydantic, python-dotenv, tiktoken, tokenizers
Required-by:
---
Name: botocore
Version: 1.37.13
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /chat-completion/lib/python3.11/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer
How I deployed in Sagemaker:
import json
from sagemaker.jumpstart.model import JumpStartModel
model_id = "huggingface-llm-qwen2-1-5b"
role_arn = '' # Replace with your SageMaker execution role ARN
my_model = JumpStartModel(
model_id=model_id,
model_version="1.2.0",
role=role_arn
)
instance_type = 'ml.g5.2xlarge'
predictor = my_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
accept_eula=True,
endpoint_name="jumpstart-model"
)
Hi all, I'm not able to repro this issue. I would appreciate any help debugging this.
This is the relevant code path - https://github.com/BerriAI/litellm/blob/main/litellm/llms/sagemaker/completion/handler.py
Hi Krish, I can help to debug this if you are available to setup some call and I can demonstrate this in my environment. sagemaker-chat works, like shown in this example: model_list:
- model_name: "sagemaker-model" litellm_params: model: "sagemaker_chat/jumpstart-dft-hf-textgeneration1-mp-20240815-185614" aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: os.environ/AWS_REGION_NAME
But the the normal client.chat.completions.create are all getting the same error. Is it possible to setup a call and look into this error together?
doing more debugging, I think the potential is related with how the litelllm is handling the input message: https://github.com/BerriAI/litellm/blob/main/litellm/llms/sagemaker/completion/transformation.py#L176
from the debug log:
POST Request Sent from LiteLLM:
curl -X POST
https://runtime.sagemaker.us-west-2.amazonaws.com/endpoints/xxxx/invocations
-H 'Content-Type: apon' -H 'X-Amz-Date: 207Z' -H 'Authorization: AW****42' -H 'Content-Length: *****'
-d '{'parameters': {'temperature': 0.6, 'top_p': 0.95}, 'inputs': 'this is a test request, write a short poem'}'
As you can see, the 'inputs' has been transformed in the way that it wrongly extract the content messages that has been sent to litellm.
In a successful case when using sdk sagemaker-chat, this is what the log says:
POST Request Sent from LiteLLM:
curl -X POST
https://runtime.sagemaker.us-west-2.amazonaws.com/endpoints/xxxx/invocations
-H 'Content-Type: ' -H 'X-Amz-Date: ' -H 'Authorization: AWS4-HMAC-SHA256 Credential=AKIAX2DZEJYYZHUPEN6X/20250316/us-west-2/sagemaker/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=ea7f20d786368efcdef8**********************************' -H 'Content-Length: *****'
-d '{'model': 'xxxx', 'messages': [{'role': 'user', 'content': 'which llm are you?'}], 'stream': False}'
Even when it comes to embedding model, the input has been forced to 'text_inputs' https://github.com/BerriAI/litellm/blob/main/litellm/llms/sagemaker/completion/handler.py#L621 But when I was using one jumpstart text embedding model, it shows the expected way to invoke the endpoint is payload = { "inputs": [ "The mitochondria is the powerhouse of the cell." ] } response = predictor.predict(payload)
Hey @melanie531 if sagemaker/ maps to the TGI format, vs. sagemaker_chat/ which is calling an openai-compatible endpoint.
If sagemaker_chat/ works, this error seems misleading - as I would expect it to complain about something in the request body when using sagemaker/
@kangks Can you confirm if sagemaker_chat/ works for you?
Hi Krish, just checked the endpoint that I created from jumpstart, the container it defaults to was LMI (dji-inference) containers. I'll test with a TGI container then.
Hey @krrishdholakia @melanie531 sagemaker-chat/ works! Let me do more testing with other engines:
Input:
response = litellm.completion(
model="sagemaker_chat/jumpstart-model",
messages= [{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, {"role": "user", "content": "what is 2+2"}],
max_tokens=1000)
Output:
ModelResponse(id='chatcmpl-140230686622480', created=1742177590, model='sagemaker_chat/', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='length', index=0, message=Message(content='2 + 2 is equal to 4.\nshould you dress for a light hearted time in life style what does this mean?\nI think the phrase "do whatever you want"
[...]',
role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=1000, prompt_tokens=34, total_tokens=1034, completion_tokens_details=None, prompt_tokens_details=None))
On the other hand, how should I specify sagemaker-chat in proxy mode?
model_list:
- model_name: jumpstart-model
litellm_params:
model: sagemaker-chat/jumpstart-model
aws_profile_name: ml-sandbox
general_settings:
# OPTIONAL Best Practices
disable_spend_logs: False # turn off writing each transaction to the db. We recommend doing this is you don't need to see Usage on the LiteLLM UI and are tracking metrics via Prometheus
disable_error_logs: False
turn_off_message_logging: False
Encountered error:
litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=sagemaker-chat/jumpstart-model
Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
Detailed debug:
% litellm --config config.yaml --detailed_debug
INFO: Started server process [50205]
INFO: Waiting for application startup.
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:454 - litellm.proxy.proxy_server.py::startup() - CHECKING PREMIUM USER - False
13:23:14 - LiteLLM Proxy:DEBUG: litellm_license.py:98 - litellm.proxy.auth.litellm_license.py::is_premium() - ENTERING 'IS_PREMIUM' - LiteLLM License=None
13:23:14 - LiteLLM Proxy:DEBUG: litellm_license.py:107 - litellm.proxy.auth.litellm_license.py::is_premium() - Updated 'self.license_str' - None
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:465 - worker_config: {"model": null, "alias": null, "api_base": null, "api_version": "2024-07-01-preview", "debug": false, "detailed_debug": true, "temperature": null, "max_tokens": null, "request_timeout": null, "max_budget": null, "telemetry": true, "drop_params": false, "add_function_to_prompt": false, "headers": null, "save": false, "config": "config.yaml", "use_queue": false}
#------------------------------------------------------------#
# #
# 'I don't like how this works...' #
# https://github.com/BerriAI/litellm/issues/new #
# #
#------------------------------------------------------------#
Thank you for using LiteLLM! - Krrish & Ishaan
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:1440 - loaded config={
"model_list": [
{
"model_name": "jumpstart-model",
"litellm_params": {
"model": "sagemaker-chat/jumpstart-model",
"aws_profile_name": "ml-sandbox"
}
}
],
"general_settings": {
"disable_spend_logs": false,
"disable_error_logs": false,
"turn_off_message_logging": false
}
}
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:2154 - _alerting_callbacks: {'disable_spend_logs': False, 'disable_error_logs': False, 'turn_off_message_logging': False}
LiteLLM: Proxy initialized with Config, Set models:
jumpstart-model
13:23:14 - LiteLLM:DEBUG: utils.py:4295 - checking potential_model_names in litellm.model_cost: {'split_model': 'sagemaker-chat/jumpstart-model', 'combined_model_name': 'sagemaker-chat/jumpstart-model', 'stripped_model_name': 'sagemaker-chat/jumpstart-model', 'combined_stripped_model_name': 'sagemaker-chat/jumpstart-model', 'custom_llm_provider': None}
13:23:14 - LiteLLM:DEBUG: utils.py:4492 - Error getting model info: This model isn't mapped yet. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
13:23:14 - LiteLLM:DEBUG: utils.py:2222 - added/updated model=sagemaker-chat/jumpstart-model in litellm.model_cost: sagemaker-chat/jumpstart-model
ERROR: Traceback (most recent call last):
File "/chat-completion/lib/python3.11/site-packages/starlette/routing.py", line 692, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/chat-completion/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 507, in proxy_startup_event
await initialize(**worker_config)
File "/chat-completion/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2936, in initialize
) = await proxy_config.load_config(router=llm_router, config_file_path=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/chat-completion/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2119, in load_config
router = litellm.Router(
^^^^^^^^^^^^^^^
File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 383, in __init__
self.set_model_list(model_list)
File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 4396, in set_model_list
self._create_deployment(
File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 4316, in _create_deployment
deployment = self._add_deployment(deployment=deployment)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 4435, in _add_deployment
) = litellm.get_llm_provider(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/chat-completion/lib/python3.11/site-packages/litellm/litellm_core_utils/get_llm_provider_logic.py", line 356, in get_llm_provider
raise e
File "/chat-completion/lib/python3.11/site-packages/litellm/litellm_core_utils/get_llm_provider_logic.py", line 333, in get_llm_provider
raise litellm.exceptions.BadRequestError( # type: ignore
litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=sagemaker-chat/jumpstart-model
Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
ERROR: Application startup failed. Exiting.
Litellm v1.63.11
% pip show litellm
Name: litellm
Version: 1.63.11
Summary: Library to easily interface with LLM API providers
Home-page:
Author: BerriAI
Author-email:
License: MIT
Location: /Users/richardkang/Documents/github/aws-ec2-llmperf/ai-code/chat-completion/lib/python3.11/site-packages
Requires: aiohttp, click, httpx, importlib-metadata, jinja2, jsonschema, openai, pydantic, python-dotenv, tiktoken, tokenizers
Required-by:
You misspelt it. It's sagemaker_chat, not sagemaker-chat
Hey @krrishdholakia
I deployed Llama 3.2 1B Instruct from SageMaker Jumpstart (steps similar to here except not vision model) and ran the following code to replicate:
import os
import litellm
from litellm import completion
litellm._turn_on_debug()
os.environ['AWS_REGION'] = 'us-east-1'
response = completion(
model="sagemaker/<endpoint-name>",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)
print(response)
This was on Python 3.12.8 with the latest PyPI version of LiteLLM (installed today)
I was able to get it to work by changing the below snippet:
try:
sync_response = sync_handler.post(
url=prepared_request.url,
headers=prepared_request.headers, # type: ignore
json=_data,
timeout=timeout,
)
to:
try:
sync_response = sync_handler.post(
url=prepared_request.url,
headers=prepared_request.headers, # type: ignore
data=prepared_request.body,
timeout=timeout,
)
(Changing from json=_data to data=prepared_request.body as the payload was prepared with a matching Content-Length). I wasn't able to exactly figure out where a mismatch in Content-Length was coming from however. I can draft up a PR to change this and submit it
@krrishdholakia as Andrew mentioned, the data passed to the payload seems to causing the issue and can we get a fix to the code soon? We have customers blocking by this issue and need some urgent attention. Thanks
Fixed since v1.63.14 - https://github.com/BerriAI/litellm/pull/9326
I'm still seeing this issue when calling Sagemaker using sagemaker_chat/<model> via the proxy version 1.65.0.rc.
This is the error log:
18:51:44 - LiteLLM Proxy:ERROR: common_request_processing.py:298 - litellm.proxy.proxy_server._handle_llm_api_exception(): Exception occured - litellm.ServiceUnavailableError: SagemakerException - Too little data for declared Content-Length. Received Model Group=sagemaker_chat
Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/llms/openai_like/chat/handler.py", line 187, in acompletion_function
response = await client.post(
^^^^^^^^^^^^^^^^^^
api_base, headers=headers, data=json.dumps(data), timeout=timeout
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/logging_utils.py", line 135, in async_wrapper
result = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 259, in post
raise e
File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 212, in post
response = await self.client.send(req, stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1661, in send
response = await self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<4 lines>...
)
^
File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
response = await self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
)
^
File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
response = await self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1763, in _send_single_request
response = await transport.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
resp = await self._pool.handle_async_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
raise exc from None
File "/usr/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
response = await connection.handle_async_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pool_request.request
^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
return await self._connection.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
raise exc
File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 88, in handle_async_request
await self._send_request_body(**kwargs)
File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 161, in _send_request_body
await self._send_event(h11.EndOfMessage(), timeout=timeout)
File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 164, in _send_event
bytes_to_send = self._h11_state.send(event)
File "/usr/lib/python3.13/site-packages/h11/_connection.py", line 512, in send
data_list = self.send_with_data_passthrough(event)
File "/usr/lib/python3.13/site-packages/h11/_connection.py", line 545, in send_with_data_passthrough
writer(event, data_list.append)
~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/h11/_writers.py", line 67, in __call__
self.send_eom(event.headers, write)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/h11/_writers.py", line 96, in send_eom
raise LocalProtocolError("Too little data for declared Content-Length")
h11._util.LocalProtocolError: Too little data for declared Content-Length
api |
During handling of the above exception, another exception occurred:
api |
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 472, in acompletion
response = await init_response
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/llms/openai_like/chat/handler.py", line 199, in acompletion_function
raise OpenAILikeError(status_code=500, message=str(e))
litellm.llms.openai_like.common_utils.OpenAILikeError: Too little data for declared Content-Length
api |
During handling of the above exception, another exception occurred:
api |
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py", line 3552, in chat_completion
return await base_llm_response_processor.base_process_llm_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<16 lines>...
)
^
File "/usr/lib/python3.13/site-packages/litellm/proxy/common_request_processing.py", line 210, in base_process_llm_request
responses = await llm_responses
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 938, in acompletion
raise e
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 914, in acompletion
response = await self.async_function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3347, in async_function_with_fallbacks
raise original_exception
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3161, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3537, in async_function_with_retries
raise original_exception
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3430, in async_function_with_retries
response = await self.make_call(original_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3546, in make_call
response = await response
^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1077, in _acompletion
raise e
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1036, in _acompletion
response = await _response
^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1441, in wrapper_async
raise e
File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1300, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 491, in acompletion
raise exception_type(
~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<3 lines>...
extra_kwargs=kwargs,
^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2214, in exception_type
raise e
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 1013, in exception_type
raise ServiceUnavailableError(
...<9 lines>...
)
litellm.exceptions.ServiceUnavailableError: litellm.ServiceUnavailableError: SagemakerException - Too little data for declared Content-Length. Received Model Group=sagemaker_chat
What is weird is that I am not able to reproduce this error if I am using the litellm client version 1.65.0. Then it actually seems like it is going a different route in the code-base as well. So maybe the PR that closed this issue fixed the issue when calling from the client but not via proxy.
there are no 2 routes for sagemaker_chat.
The error originates from the correct place - the sagemaker chat route is openai like, and uses the openai_like route - https://github.com/BerriAI/litellm/blob/aa2489d74fc5968c7be9add11d9e064170a8edde/litellm/llms/sagemaker/chat/handler.py#L157
That is also shown in your exception
I don't deny you're seeing an error here @Jacobh2 - it doesn't seem like it's because of a misrouting though. I will qa the sagemaker_chat call via proxy to see what could be happening
Thank you @krrishdholakia for looking into it. I think this issue, plus tokeniser issue with sagemaker models are the two big blockers for us to be able to enable open source models 🙏
Just for context, why I thought it was two code paths is because when I'm testing this via the client by adding logs to where it fails in the proxy, I don't see those logs, so it takes another way. Which is weird.
@krrishdholakia where you able to find something while qa the sagemaker_chat via proxy? 🙏
Fixed since v1.63.14 - #9326
@krrishdholakia has this fix been ported to later versions? I've tried with v1.64.1 and 1.65.4 and encounter this exact same error for sagemaker_chat. We can't upgrade to later versions because that breaks our integration with our own Rasa models on Huggingface which do not have an inference provider. As a minimum we also need to upgrade to 1.64.1 because of the DoS vulnerability found in v 1.52.16.