litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Bug]: "timeout" and "stream_timeout" set at the model level in config.yaml do not work

Open jeromeroussin opened this issue 1 year ago • 1 comments

What happened?

I am setting both "timeout" and "stream_timeout" in my config.yaml like below.

- model_name: "gpt-4o"
    litellm_params:
      model: "azure/gpt-4o"
      api_key: "os.environ/AZURE_API_KEY_EU2"
      api_base: "os.environ/AZURE_API_BASE_EU2"
      api_version: "os.environ/AZURE_API_VERSION"
      timeout: 300
      stream_timeout: 120
      tpm: 5000000
      tags: ["East US 2"]
    model_info:
      mode: "chat"
      base_model: "azure/gpt-4o"
      <truncated>

I do not set request_timeout under litellm_settings:

litellm_settings:
  num_retries: 0
  callbacks: callback.handler
  drop_params: true
  # request_timeout: 120
  # set_verbose: true

What I am observing is that some requests (likely hung for a reason or another) do not get timed out until they reach exactly 6000s, which is the default for request_timeout: https://github.com/BerriAI/litellm/blob/04238cd9a97e802b2637924b8eed46c9012878c6/litellm/init.py#L297

I therefore question whether timeout and stream_timeout really do what they are supposed to do: https://docs.litellm.ai/docs/proxy/reliability#custom-timeouts-stream-timeouts---per-model

Relevant log output

No response

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.53.1

Twitter / LinkedIn details

https://www.linkedin.com/in/jeromeroussin/

jeromeroussin avatar Dec 03 '24 13:12 jeromeroussin

Here is the timeout stacktrace for one of those 6000s (non-streaming) timeouts if that helps:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1576, in _request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1631, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1659, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1696, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1732, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
    with map_httpcore_exceptions():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 616, in acompletion
    headers, response = await self.make_azure_openai_chat_completion_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 349, in make_azure_openai_chat_completion_request
    raise e
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 341, in make_azure_openai_chat_completion_request
    raw_response = await azure_client.chat.completions.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_legacy_response.py", line 373, in wrapped
    return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1661, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1843, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1537, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
    raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 481, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 667, in acompletion
    raise AzureOpenAIError(status_code=500, message=str(e))
litellm.llms.AzureOpenAI.azure.AzureOpenAIError: Request timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/utils.py", line 1031, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 503, in acompletion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
    raise e
  File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 194, in exception_type
    raise Timeout(
litellm.exceptions.Timeout: litellm.Timeout: APITimeoutError - Request timed out. 
error_str: Request timed out.

jeromeroussin avatar Dec 03 '24 14:12 jeromeroussin

Thinking through this:

  • num_retries for the deployment would be passed into the .completion() function
  • at the router, for async_function_with_retries -> this the num retries across the model group,

so if a user is setting a num retry on a specific model in a list,

either it means retry on this specific model X times (this would retry that specific model X times and if that fails retry across the model group Y times - where Y is the global litellm.num_retries setting) => that doesn't sound good, as i don't think as a user i'm expecting it to multiply my num retries (X * Y) for these requests

or retry requests which route to this model X times (this would pass the num retries for this deployment into the kwargs used to track the amount we should run retries on this request)

krrishdholakia avatar Dec 14 '24 20:12 krrishdholakia

ignore my comments - i was writing to address num retries, which has a similar issue.

krrishdholakia avatar Dec 14 '24 20:12 krrishdholakia

I can see the timeout being correctly passed for anthropic. i wonder if this is azure specific. testing now

krrishdholakia avatar Dec 14 '24 20:12 krrishdholakia

i see the timeout being correctly passed to azure as well

krrishdholakia avatar Dec 14 '24 20:12 krrishdholakia

Logs

Screenshot 2024-12-14 at 12 58 09 PM

Print

Screenshot 2024-12-14 at 12 58 38 PM

krrishdholakia avatar Dec 14 '24 20:12 krrishdholakia

This is not solved for Aider v0.72.3 when accessing the llama-server endpoint.

aider ignores --timeout flag and times out the requests and then attempts to cancel the request, which of course does not work:

aider --model openai/deepseek-r1 --timeout 60000

litellm.APIConnectionError: APIConnectionError: OpenAIException - timed out
Retrying in 0.2 seconds...
litellm.APIConnectionError: APIConnectionError: OpenAIException - peer closed connection without sending complete message
body (incomplete chunked read)
Retrying in 0.5 seconds...
litellm.APIError: APIError: OpenAIException - Connection error.
Retrying in 1.0 seconds...
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 42016, n_keep = 0, n_prompt_tokens = 2074
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.987464
srv  cancel_tasks: cancel task, id_task = 0
request: POST /chat/completions 127.0.0.1 200
srv  cancel_tasks: cancel task, id_task = 2
request: POST /chat/completions 127.0.0.1 200
...

vmajor avatar Jan 29 '25 01:01 vmajor

Confirmed by user this is fixed in v1.63.12

krrishdholakia avatar Mar 20 '25 22:03 krrishdholakia

any idea how this was fixed? getting mysterious timeouts -> retries with an oai-compatible server using aider as well

the issue seemed to be that the timeout in aider benchmarking was too long: 24 * 60 * 60. its possible this caused some kind of overflow resulting in extremely short timeouts. https://github.com/Aider-AI/aider/blob/main/benchmark/benchmark.py#L348

setting this to 60 * 60 resolved the short timeouts

justinchiu-cohere avatar Jun 13 '25 16:06 justinchiu-cohere

Hey @justinchiu-cohere what do your timeout messages say?

We emit the timeout value set, and how long the request ran for, in the error message

krrishdholakia avatar Jun 13 '25 17:06 krrishdholakia

Hey! The timeout messages interestingly did not return those values, so I couldn't find where they were in the litellm codebase.

Messages:

litellm.Timeout: Timeout Error: OpenAIException - stream timeout
The API provider timed out without return a response. They may be down or overloaded.

With retries from .2 -> .5 -> ...

Outside of aider's polyglot benchmark the responses took around 3 min to respond, and setting the timeout from 24 hours -> 1 hour resolved this timeout error.

justinchiu-cohere avatar Jun 13 '25 17:06 justinchiu-cohere

The timeout messages are here for openai exceptions - https://github.com/BerriAI/litellm/blob/5007ef868f0e6ef5e07bb5ded0e8a260c26a726e/litellm/llms/openai/openai.py#L433

oh is that a sync request? @justinchiu-cohere

i see we're missing it on the sync route - https://github.com/BerriAI/litellm/blob/5007ef868f0e6ef5e07bb5ded0e8a260c26a726e/litellm/llms/openai/openai.py#L439

Will push a fix with it if that's what aider is using in their test

krrishdholakia avatar Jun 13 '25 17:06 krrishdholakia

Yes, it looks like aider uses the sync completions in general: https://github.com/Aider-AI/aider/blob/main/aider/models.py#L982

justinchiu-cohere avatar Jun 13 '25 17:06 justinchiu-cohere