litellm icon indicating copy to clipboard operation
litellm copied to clipboard

Error: Sagemaker error Too little data for declared Content-Length

Open massi-ang opened this issue 9 months ago • 21 comments

I am trying to invoke a Qwen model hosted via SM Endopoint using the 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 LMI container and getting this error:

completion(model="sagemaker/<endpoint>, messages=[{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, {"role": "user", "content": prompt}], max_tokens=1000)
DEBUG:LiteLLM:

POST Request Sent from LiteLLM:
curl -X POST \
https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/djl-inference-2025-02-14-10-08-10-995/invocations \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEAMaCXVzLWVhc3QtMSJIMEYCIQCf8O/rjgsIGA8AAd32YqP1hPAps8Lv1MO7wGmkbkSBugIhAN9rLXBe5b58Jyu382pwwU8ZIc9KjjkPPQ1h9W7/f9PKKqsDCCwQBBoMNjk5MzkxMDE5Njk4Igy18IhpcEdZbHMpCmwqiAMKjouElJPlgKX1Vp0rXuL+CotC78GxabBqQq4t2NP5cCLzXMMkr4Crf4/ALdVETud5xALtCyh0bXIQbeCtVlVxXaMxliaFWy1bz/UMqP2HPKb2dPKZ6D9C3HdPs9Y/XKG0fPnFCBSRzVBX2V49K9g897zakSw9RCkPw+w7MZgCPoIrbl0FOGI3B3+xC3fYXdDUOOWqHLsEDJXLDVnqRqxV9pVqD8Mi3Iw7VW/0iqUxRsmIuRXMtbspH3FYImowNfetC+99E1ReYqkaN0478ZByZTRGBTO9VTPhGJ9uDwPAmLVLWv/rKBHtlVQa8Ut5MWdytq+2oFWNnyszfIZ/XBhwsMPxQocUlFlKYPhkfu6sWdsgNZcv8BlzWrkgnC/CpBZVuOl4cj9lM+/GbH3KItlJDcyslPbYlPmdwRZfy+97pyUeG0WEmdeioGgLl0/YPVjFZyJtkJ6UUUW1Qtwd7OYWBbdby/yFB5ShXzc0eP5DqZQ3K0zXRmTYeSsiyDbuImtUpsHcrttmEjCnvry9BjqdAa5rMK1a62o/f+7ACESMrm7soTxCnQgHs3kdsyqqNTa5jOyhLupvgdg+ovrpERaD+ASXwg3cYNxyos0hiki5ktd9+mha5S46OXHh+RjxPmjPGN7qIqFAq3y3yFyDRjE4J1C7UrhT+zhf5uMQPtvXspKkRJ8ZMw1TjCA9G8VB********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIA2FVXUBKZITRFXD7B/20250214/us-east-1/sagemaker/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=2956d2f01968c5a3571e********************************************' -H 'Content-Length: *****' \
-d '{'parameters': {'max_new_tokens': 1000}, 'inputs': 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant what is 2+2'}'


10:57:44 - LiteLLM:DEBUG: utils.py:301 - RAW RESPONSE:
Too little data for declared Content-Length


DEBUG:LiteLLM:RAW RESPONSE:
Too little data for declared Content-Length


10:57:44 - LiteLLM:ERROR: handler.py:330 - Sagemaker error Too little data for declared Content-Length

Also, why is litellm using the generate payload (which does not apply the chat template from the tokenizer, instead of the chat/completion payload?

For example this works:

r = sm.invoke_endpoint(
    EndpointName = "<endpoint>",
    Body=json.dumps({"messages":[{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, {"role": "user", "content": prompt}], "max_tokens":2048, "stop":["<|im_end|>"]}),
    ContentType='application/json'
)
resp = json.loads(r.get("Body").read().decode("utf-8"))
print(resp['choices'][0]['message']['content'])

massi-ang avatar Feb 14 '25 11:02 massi-ang

Same issue

esogas avatar Feb 18 '25 01:02 esogas

Same issue here..

gsjoy8888 avatar Feb 19 '25 07:02 gsjoy8888

same issue here

The-DarkMatter avatar Feb 20 '25 07:02 The-DarkMatter

same issue here

jnikhilreddy avatar Feb 23 '25 05:02 jnikhilreddy

Any fix available for this issue?

swagulkarni avatar Mar 11 '25 20:03 swagulkarni

Also, why is litellm using the generate payload (which does not apply the chat template from the tokenizer, instead of the chat/completion payload?

Hey @massi-ang you can specify the chat format using sagemaker_chat/ - https://docs.litellm.ai/docs/providers/aws_sagemaker#sagemaker-messages-api

krrishdholakia avatar Mar 14 '25 16:03 krrishdholakia

Testing this - it works

Image

krrishdholakia avatar Mar 14 '25 16:03 krrishdholakia

Please confirm you're on the latest version of litellm

krrishdholakia avatar Mar 14 '25 16:03 krrishdholakia

I retried the same with LiteLLM 1.63.11 and the error seems to be gone.

massi-ang avatar Mar 15 '25 18:03 massi-ang

I'm still encountering the same error even with the latest version of litellm

10:31:07 - LiteLLM:INFO: utils.py:2999 - LiteLLM completion() model= deepseek-r1-distill-llama-8b-091017; provider = sagemaker 10:31:07 - LiteLLM:DEBUG: utils.py:3002 - LiteLLM: Params passed to completion() {'model': 'deepseek-r1-distill-llama-8b-091017', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'sagemaker', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'reasoning_effort': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'Hello, how are you?'}], 'thinking': None} 10:31:07 - LiteLLM:DEBUG: utils.py:3005 - ... ... 21:28:33 - LiteLLM:DEBUG: get_api_base.py:63 - Error occurred in getting api base - litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=jumpstart-dft-deepseek-llm-r1-disti-20250316-091017 Pass model as E.g. For 'Huggingface' inference endpoints pass in completion(model='huggingface/starcoder',..) Learn more: https://docs.litellm.ai/docs/providers 21:28:33 - LiteLLM:DEBUG: exception_mapping_utils.py:2243 - Logging Details: logger_fn - None | callable(logger_fn) - False 21:28:33 - LiteLLM:DEBUG: litellm_logging.py:1932 - Logging Details LiteLLM-Failure Call: [] Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

melanie531 avatar Mar 16 '25 10:03 melanie531

Same here, still encountered the error even with all the latest packages:

% pip show h11 httpx litellm botocore 
Name: h11
Version: 0.14.0
Summary: A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
Home-page: https://github.com/python-hyper/h11
Author: Nathaniel J. Smith
Author-email: [email protected]
License: MIT
Location: /chat-completion/lib/python3.11/site-packages
Requires: 
Required-by: httpcore, uvicorn
---
Name: httpx
Version: 0.28.1
Summary: The next generation HTTP client.
Home-page: 
Author: 
Author-email: Tom Christie <[email protected]>
License: BSD-3-Clause
Location: /chat-completion/lib/python3.11/site-packages
Requires: anyio, certifi, httpcore, idna
Required-by: litellm, openai
---
Name: litellm
Version: 1.63.11
Summary: Library to easily interface with LLM API providers
Home-page: 
Author: BerriAI
Author-email: 
License: MIT
Location: /chat-completion/lib/python3.11/site-packages
Requires: aiohttp, click, httpx, importlib-metadata, jinja2, jsonschema, openai, pydantic, python-dotenv, tiktoken, tokenizers
Required-by: 
---
Name: botocore
Version: 1.37.13
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /chat-completion/lib/python3.11/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer

How I deployed in Sagemaker:

import json
from sagemaker.jumpstart.model import JumpStartModel

model_id = "huggingface-llm-qwen2-1-5b"
role_arn = ''  # Replace with your SageMaker execution role ARN
my_model = JumpStartModel(
    model_id=model_id,
    model_version="1.2.0",
    role=role_arn
)
instance_type = 'ml.g5.2xlarge' 

predictor = my_model.deploy(
    initial_instance_count=1, 
    instance_type=instance_type,
    accept_eula=True,
    endpoint_name="jumpstart-model"
)

kangks avatar Mar 16 '25 15:03 kangks

Hi all, I'm not able to repro this issue. I would appreciate any help debugging this.

krrishdholakia avatar Mar 16 '25 17:03 krrishdholakia

This is the relevant code path - https://github.com/BerriAI/litellm/blob/main/litellm/llms/sagemaker/completion/handler.py

krrishdholakia avatar Mar 16 '25 17:03 krrishdholakia

Hi Krish, I can help to debug this if you are available to setup some call and I can demonstrate this in my environment. sagemaker-chat works, like shown in this example: model_list:

  • model_name: "sagemaker-model" litellm_params: model: "sagemaker_chat/jumpstart-dft-hf-textgeneration1-mp-20240815-185614" aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: os.environ/AWS_REGION_NAME

But the the normal client.chat.completions.create are all getting the same error. Is it possible to setup a call and look into this error together?

melanie531 avatar Mar 16 '25 22:03 melanie531

doing more debugging, I think the potential is related with how the litelllm is handling the input message: https://github.com/BerriAI/litellm/blob/main/litellm/llms/sagemaker/completion/transformation.py#L176

from the debug log: POST Request Sent from LiteLLM: curl -X POST
https://runtime.sagemaker.us-west-2.amazonaws.com/endpoints/xxxx/invocations
-H 'Content-Type: apon' -H 'X-Amz-Date: 207Z' -H 'Authorization: AW****42' -H 'Content-Length: *****'
-d '{'parameters': {'temperature': 0.6, 'top_p': 0.95}, 'inputs': 'this is a test request, write a short poem'}'

As you can see, the 'inputs' has been transformed in the way that it wrongly extract the content messages that has been sent to litellm.

In a successful case when using sdk sagemaker-chat, this is what the log says: POST Request Sent from LiteLLM: curl -X POST
https://runtime.sagemaker.us-west-2.amazonaws.com/endpoints/xxxx/invocations
-H 'Content-Type: ' -H 'X-Amz-Date: ' -H 'Authorization: AWS4-HMAC-SHA256 Credential=AKIAX2DZEJYYZHUPEN6X/20250316/us-west-2/sagemaker/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=ea7f20d786368efcdef8**********************************' -H 'Content-Length: *****'
-d '{'model': 'xxxx', 'messages': [{'role': 'user', 'content': 'which llm are you?'}], 'stream': False}'

Even when it comes to embedding model, the input has been forced to 'text_inputs' https://github.com/BerriAI/litellm/blob/main/litellm/llms/sagemaker/completion/handler.py#L621 But when I was using one jumpstart text embedding model, it shows the expected way to invoke the endpoint is payload = { "inputs": [ "The mitochondria is the powerhouse of the cell." ] } response = predictor.predict(payload)

melanie531 avatar Mar 17 '25 00:03 melanie531

Hey @melanie531 if sagemaker/ maps to the TGI format, vs. sagemaker_chat/ which is calling an openai-compatible endpoint.

If sagemaker_chat/ works, this error seems misleading - as I would expect it to complain about something in the request body when using sagemaker/

@kangks Can you confirm if sagemaker_chat/ works for you?

krrishdholakia avatar Mar 17 '25 01:03 krrishdholakia

Hi Krish, just checked the endpoint that I created from jumpstart, the container it defaults to was LMI (dji-inference) containers. I'll test with a TGI container then.

melanie531 avatar Mar 17 '25 02:03 melanie531

Hey @krrishdholakia @melanie531 sagemaker-chat/ works! Let me do more testing with other engines:

Input:

response = litellm.completion(
    model="sagemaker_chat/jumpstart-model", 
    messages= [{"role":"system", "content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant"}, {"role": "user", "content": "what is 2+2"}],
    max_tokens=1000)

Output:

ModelResponse(id='chatcmpl-140230686622480', created=1742177590, model='sagemaker_chat/', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='length', index=0, message=Message(content='2 + 2 is equal to 4.\nshould you dress for a light hearted time in life style what does this mean?\nI think the phrase "do whatever you want" 
[...]', 
role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=1000, prompt_tokens=34, total_tokens=1034, completion_tokens_details=None, prompt_tokens_details=None))

kangks avatar Mar 17 '25 02:03 kangks

On the other hand, how should I specify sagemaker-chat in proxy mode?

model_list:
  - model_name: jumpstart-model
    litellm_params:
      model: sagemaker-chat/jumpstart-model
      aws_profile_name: ml-sandbox
general_settings:
  # OPTIONAL Best Practices
  disable_spend_logs: False # turn off writing each transaction to the db. We recommend doing this is you don't need to see Usage on the LiteLLM UI and are tracking metrics via Prometheus
  disable_error_logs: False
  turn_off_message_logging: False

Encountered error:

litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=sagemaker-chat/jumpstart-model
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers

Detailed debug:

% litellm --config config.yaml --detailed_debug
INFO:     Started server process [50205]
INFO:     Waiting for application startup.
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:454 - litellm.proxy.proxy_server.py::startup() - CHECKING PREMIUM USER - False
13:23:14 - LiteLLM Proxy:DEBUG: litellm_license.py:98 - litellm.proxy.auth.litellm_license.py::is_premium() - ENTERING 'IS_PREMIUM' - LiteLLM License=None
13:23:14 - LiteLLM Proxy:DEBUG: litellm_license.py:107 - litellm.proxy.auth.litellm_license.py::is_premium() - Updated 'self.license_str' - None
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:465 - worker_config: {"model": null, "alias": null, "api_base": null, "api_version": "2024-07-01-preview", "debug": false, "detailed_debug": true, "temperature": null, "max_tokens": null, "request_timeout": null, "max_budget": null, "telemetry": true, "drop_params": false, "add_function_to_prompt": false, "headers": null, "save": false, "config": "config.yaml", "use_queue": false}

#------------------------------------------------------------#
#                                                            #
#              'I don't like how this works...'               #
#        https://github.com/BerriAI/litellm/issues/new        #
#                                                            #
#------------------------------------------------------------#

 Thank you for using LiteLLM! - Krrish & Ishaan



Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new


13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:1440 - loaded config={
    "model_list": [
        {
            "model_name": "jumpstart-model",
            "litellm_params": {
                "model": "sagemaker-chat/jumpstart-model",
                "aws_profile_name": "ml-sandbox"
            }
        }
    ],
    "general_settings": {
        "disable_spend_logs": false,
        "disable_error_logs": false,
        "turn_off_message_logging": false
    }
}
13:23:14 - LiteLLM Proxy:DEBUG: proxy_server.py:2154 - _alerting_callbacks: {'disable_spend_logs': False, 'disable_error_logs': False, 'turn_off_message_logging': False}
LiteLLM: Proxy initialized with Config, Set models:
    jumpstart-model
13:23:14 - LiteLLM:DEBUG: utils.py:4295 - checking potential_model_names in litellm.model_cost: {'split_model': 'sagemaker-chat/jumpstart-model', 'combined_model_name': 'sagemaker-chat/jumpstart-model', 'stripped_model_name': 'sagemaker-chat/jumpstart-model', 'combined_stripped_model_name': 'sagemaker-chat/jumpstart-model', 'custom_llm_provider': None}
13:23:14 - LiteLLM:DEBUG: utils.py:4492 - Error getting model info: This model isn't mapped yet. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
13:23:14 - LiteLLM:DEBUG: utils.py:2222 - added/updated model=sagemaker-chat/jumpstart-model in litellm.model_cost: sagemaker-chat/jumpstart-model
ERROR:    Traceback (most recent call last):
  File "/chat-completion/lib/python3.11/site-packages/starlette/routing.py", line 692, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/chat-completion/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 507, in proxy_startup_event
    await initialize(**worker_config)
  File "/chat-completion/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2936, in initialize
    ) = await proxy_config.load_config(router=llm_router, config_file_path=config)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chat-completion/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2119, in load_config
    router = litellm.Router(
             ^^^^^^^^^^^^^^^
  File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 383, in __init__
    self.set_model_list(model_list)
  File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 4396, in set_model_list
    self._create_deployment(
  File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 4316, in _create_deployment
    deployment = self._add_deployment(deployment=deployment)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chat-completion/lib/python3.11/site-packages/litellm/router.py", line 4435, in _add_deployment
    ) = litellm.get_llm_provider(
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chat-completion/lib/python3.11/site-packages/litellm/litellm_core_utils/get_llm_provider_logic.py", line 356, in get_llm_provider
    raise e
  File "/chat-completion/lib/python3.11/site-packages/litellm/litellm_core_utils/get_llm_provider_logic.py", line 333, in get_llm_provider
    raise litellm.exceptions.BadRequestError(  # type: ignore
litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=sagemaker-chat/jumpstart-model
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers

ERROR:    Application startup failed. Exiting.

Litellm v1.63.11

% pip show litellm
Name: litellm
Version: 1.63.11
Summary: Library to easily interface with LLM API providers
Home-page: 
Author: BerriAI
Author-email: 
License: MIT
Location: /Users/richardkang/Documents/github/aws-ec2-llmperf/ai-code/chat-completion/lib/python3.11/site-packages
Requires: aiohttp, click, httpx, importlib-metadata, jinja2, jsonschema, openai, pydantic, python-dotenv, tiktoken, tokenizers
Required-by: 

kangks avatar Mar 17 '25 05:03 kangks

You misspelt it. It's sagemaker_chat, not sagemaker-chat

krrishdholakia avatar Mar 17 '25 13:03 krrishdholakia

Hey @krrishdholakia

I deployed Llama 3.2 1B Instruct from SageMaker Jumpstart (steps similar to here except not vision model) and ran the following code to replicate:

import os 
import litellm
from litellm import completion
litellm._turn_on_debug()

os.environ['AWS_REGION'] = 'us-east-1'

response = completion(
            model="sagemaker/<endpoint-name>", 
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            temperature=0.2,
            max_tokens=80
        )

print(response)

This was on Python 3.12.8 with the latest PyPI version of LiteLLM (installed today)

I was able to get it to work by changing the below snippet:

            try:
                sync_response = sync_handler.post(
                    url=prepared_request.url,
                    headers=prepared_request.headers,  # type: ignore
                    json=_data,
                    timeout=timeout,
                )

ref

to:

            try:
                sync_response = sync_handler.post(
                    url=prepared_request.url,
                    headers=prepared_request.headers,  # type: ignore
                    data=prepared_request.body,
                    timeout=timeout,
                )

(Changing from json=_data to data=prepared_request.body as the payload was prepared with a matching Content-Length). I wasn't able to exactly figure out where a mismatch in Content-Length was coming from however. I can draft up a PR to change this and submit it

andjsmi avatar Mar 17 '25 23:03 andjsmi

@krrishdholakia as Andrew mentioned, the data passed to the payload seems to causing the issue and can we get a fix to the code soon? We have customers blocking by this issue and need some urgent attention. Thanks

melanie531 avatar Mar 19 '25 03:03 melanie531

Fixed since v1.63.14 - https://github.com/BerriAI/litellm/pull/9326

krrishdholakia avatar Mar 22 '25 23:03 krrishdholakia

I'm still seeing this issue when calling Sagemaker using sagemaker_chat/<model> via the proxy version 1.65.0.rc.

This is the error log:

18:51:44 - LiteLLM Proxy:ERROR: common_request_processing.py:298 - litellm.proxy.proxy_server._handle_llm_api_exception(): Exception occured - litellm.ServiceUnavailableError: SagemakerException - Too little data for declared Content-Length. Received Model Group=sagemaker_chat
Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/llms/openai_like/chat/handler.py", line 187, in acompletion_function
    response = await client.post(
               ^^^^^^^^^^^^^^^^^^
        api_base, headers=headers, data=json.dumps(data), timeout=timeout
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/logging_utils.py", line 135, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 259, in post
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 212, in post
    response = await self.client.send(req, stream=stream)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1661, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpx/_client.py", line 1763, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File "/usr/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        pool_request.request
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
    raise exc
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 88, in handle_async_request
    await self._send_request_body(**kwargs)
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 161, in _send_request_body
    await self._send_event(h11.EndOfMessage(), timeout=timeout)
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 164, in _send_event
    bytes_to_send = self._h11_state.send(event)
  File "/usr/lib/python3.13/site-packages/h11/_connection.py", line 512, in send
    data_list = self.send_with_data_passthrough(event)
  File "/usr/lib/python3.13/site-packages/h11/_connection.py", line 545, in send_with_data_passthrough
    writer(event, data_list.append)
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/h11/_writers.py", line 67, in __call__
    self.send_eom(event.headers, write)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/h11/_writers.py", line 96, in send_eom
    raise LocalProtocolError("Too little data for declared Content-Length")
h11._util.LocalProtocolError: Too little data for declared Content-Length
api         |
During handling of the above exception, another exception occurred:
api         |
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 472, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/llms/openai_like/chat/handler.py", line 199, in acompletion_function
    raise OpenAILikeError(status_code=500, message=str(e))
litellm.llms.openai_like.common_utils.OpenAILikeError: Too little data for declared Content-Length
api         |
During handling of the above exception, another exception occurred:
api         |
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py", line 3552, in chat_completion
    return await base_llm_response_processor.base_process_llm_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<16 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/proxy/common_request_processing.py", line 210, in base_process_llm_request
    responses = await llm_responses
                ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 938, in acompletion
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 914, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3347, in async_function_with_fallbacks
    raise original_exception
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3161, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3537, in async_function_with_retries
    raise original_exception
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3430, in async_function_with_retries
    response = await self.make_call(original_function, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3546, in make_call
    response = await response
               ^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1077, in _acompletion
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1036, in _acompletion
    response = await _response
               ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1441, in wrapper_async
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1300, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 491, in acompletion
    raise exception_type(
          ~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<3 lines>...
        extra_kwargs=kwargs,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2214, in exception_type
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 1013, in exception_type
    raise ServiceUnavailableError(
    ...<9 lines>...
    )
litellm.exceptions.ServiceUnavailableError: litellm.ServiceUnavailableError: SagemakerException - Too little data for declared Content-Length. Received Model Group=sagemaker_chat

What is weird is that I am not able to reproduce this error if I am using the litellm client version 1.65.0. Then it actually seems like it is going a different route in the code-base as well. So maybe the PR that closed this issue fixed the issue when calling from the client but not via proxy.

Jacobh2 avatar Mar 29 '25 19:03 Jacobh2

there are no 2 routes for sagemaker_chat.

The error originates from the correct place - the sagemaker chat route is openai like, and uses the openai_like route - https://github.com/BerriAI/litellm/blob/aa2489d74fc5968c7be9add11d9e064170a8edde/litellm/llms/sagemaker/chat/handler.py#L157

That is also shown in your exception

I don't deny you're seeing an error here @Jacobh2 - it doesn't seem like it's because of a misrouting though. I will qa the sagemaker_chat call via proxy to see what could be happening

krrishdholakia avatar Mar 29 '25 19:03 krrishdholakia

Thank you @krrishdholakia for looking into it. I think this issue, plus tokeniser issue with sagemaker models are the two big blockers for us to be able to enable open source models 🙏

Just for context, why I thought it was two code paths is because when I'm testing this via the client by adding logs to where it fails in the proxy, I don't see those logs, so it takes another way. Which is weird.

Jacobh2 avatar Mar 29 '25 21:03 Jacobh2

@krrishdholakia where you able to find something while qa the sagemaker_chat via proxy? 🙏

Jacobh2 avatar Apr 09 '25 19:04 Jacobh2

Fixed since v1.63.14 - #9326

@krrishdholakia has this fix been ported to later versions? I've tried with v1.64.1 and 1.65.4 and encounter this exact same error for sagemaker_chat. We can't upgrade to later versions because that breaks our integration with our own Rasa models on Huggingface which do not have an inference provider. As a minimum we also need to upgrade to 1.64.1 because of the DoS vulnerability found in v 1.52.16.

ancalita avatar Apr 29 '25 09:04 ancalita