litellm [Bug]: "timeout" and "stream_timeout" set at the model level in config.yaml do not work

What happened?

I am setting both "timeout" and "stream_timeout" in my config.yaml like below.

- model_name: "gpt-4o"
    litellm_params:
      model: "azure/gpt-4o"
      api_key: "os.environ/AZURE_API_KEY_EU2"
      api_base: "os.environ/AZURE_API_BASE_EU2"
      api_version: "os.environ/AZURE_API_VERSION"
      timeout: 300
      stream_timeout: 120
      tpm: 5000000
      tags: ["East US 2"]
    model_info:
      mode: "chat"
      base_model: "azure/gpt-4o"
      <truncated>

I do not set request_timeout under litellm_settings:

litellm_settings:
  num_retries: 0
  callbacks: callback.handler
  drop_params: true
  # request_timeout: 120
  # set_verbose: true

What I am observing is that some requests (likely hung for a reason or another) do not get timed out until they reach exactly 6000s, which is the default for request_timeout: https://github.com/BerriAI/litellm/blob/04238cd9a97e802b2637924b8eed46c9012878c6/litellm/init.py#L297

I therefore question whether timeout and stream_timeout really do what they are supposed to do: https://docs.litellm.ai/docs/proxy/reliability#custom-timeouts-stream-timeouts---per-model

Relevant log output

No response

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.53.1

Twitter / LinkedIn details

https://www.linkedin.com/in/jeromeroussin/

Dec 03 '24 13:12 jeromeroussin

Here is the timeout stacktrace for one of those 6000s (non-streaming) timeouts if that helps:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1576, in _request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1631, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1659, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1696, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1732, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
    with map_httpcore_exceptions():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 616, in acompletion
    headers, response = await self.make_azure_openai_chat_completion_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 349, in make_azure_openai_chat_completion_request
    raise e
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 341, in make_azure_openai_chat_completion_request
    raw_response = await azure_client.chat.completions.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_legacy_response.py", line 373, in wrapped
    return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1661, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1843, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1537, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
    raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 481, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 667, in acompletion
    raise AzureOpenAIError(status_code=500, message=str(e))
litellm.llms.AzureOpenAI.azure.AzureOpenAIError: Request timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/utils.py", line 1031, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 503, in acompletion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
    raise e
  File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 194, in exception_type
    raise Timeout(
litellm.exceptions.Timeout: litellm.Timeout: APITimeoutError - Request timed out. 
error_str: Request timed out.

Dec 03 '24 14:12 jeromeroussin

Thinking through this:

num_retries for the deployment would be passed into the .completion() function
at the router, for async_function_with_retries -> this the num retries across the model group,

so if a user is setting a num retry on a specific model in a list,

either it means retry on this specific model X times (this would retry that specific model X times and if that fails retry across the model group Y times - where Y is the global litellm.num_retries setting) => that doesn't sound good, as i don't think as a user i'm expecting it to multiply my num retries (X * Y) for these requests

or retry requests which route to this model X times (this would pass the num retries for this deployment into the kwargs used to track the amount we should run retries on this request)