litellm [Bug]: custom model timeout does not work

What happened?

We have these settings set:

litellm_settings:
  callbacks: callback.handler
  drop_params: true
  request_timeout: 120

We are observing timeouts for one of our deployments, so I have added a timeout just for it:

- model_name: "o1-mini"
    litellm_params:
      model: "azure/o1-mini"
      // ... Removed configs here...
      timeout: 180

But we are still observing timeout at 120s and not 180s, which is unexpected and unwelcome, we are trying to avoid raising timeouts for all deployments.

Relevant log output

No response

Twitter / LinkedIn details

https://www.linkedin.com/in/jeromeroussin/

Oct 23 '24 12:10 jeromeroussin

@Jerome Roussin unable to repro the issue - it looks like it's sending the correct timeout value for me

are you on the latest litellm version ? with this config - I could see 180 being set as the timeout for my request

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: os.environ/AZURE_API_KEY # The `os.environ/` prefix tells litellm to read this from the env. See https://docs.litellm.ai/docs/simple_proxy#load-api-keys-from-vault
      timeout: 180
litellm_settings:
  drop_params: true
  request_timeout: 120

Oct 26 '24 06:10 ishaan-jaff

Addressed in a later bug report: https://github.com/BerriAI/litellm/issues/7001

Dec 15 '24 13:12 jeromeroussin

The issue still exists in 1.57.3. The "timeout" value of our models is not being respected and we're seeing timeouts at the default 6000s value

Jan 10 '25 17:01 jeromeroussin

A stacktrace of a 6000s timeout:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1576, in _request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1674, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 376, in handle_async_request
    with map_httpcore_exceptions():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/azure/azure.py", line 589, in acompletion
    headers, response = await self.make_azure_openai_chat_completion_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/azure/azure.py", line 317, in make_azure_openai_chat_completion_request
    raise e
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/azure/azure.py", line 309, in make_azure_openai_chat_completion_request
    raw_response = await azure_client.chat.completions.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_legacy_response.py", line 373, in wrapped
    return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1720, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1843, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1537, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
    raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 447, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/llms/azure/azure.py", line 640, in acompletion
    raise AzureOpenAIError(status_code=500, message=str(e))
litellm.llms.azure.common_utils.AzureOpenAIError: Request timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/litellm/utils.py", line 1085, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 466, in acompletion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2189, in exception_type
    raise e
  File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 227, in exception_type
    raise Timeout(
litellm.exceptions.Timeout: litellm.Timeout: APITimeoutError - Request timed out. 
error_str: Request timed out.

Jan 10 '25 17:01 jeromeroussin

can you share your updated config for repro? @jeromeroussin

Jan 10 '25 18:01 krrishdholakia

@jeromeroussin if that's the stacktrace while calling o1, then that looks incorrect to me.

o1 calls should be routed inside azure/o1_handler.py - https://github.com/BerriAI/litellm/blob/8576ca8ccb12865eb52d1d6f4679085b7c10167c/litellm/llms/azure/chat/o1_handler.py#L4

https://github.com/BerriAI/litellm/blob/8576ca8ccb12865eb52d1d6f4679085b7c10167c/litellm/main.py#L1148

not azure.py. could this be an older version of litellm?

Jan 11 '25 01:01 krrishdholakia

unable to repro this error. Here's the debug log showing the timeout being sent from Router (stores deployment information) to litellm translation layer.

Note: for me this is correctly displaying the 1.0 timeout.

17:17:11 - LiteLLM:DEBUG: utils.py:285 - Request to litellm:
17:17:11 - LiteLLM:DEBUG: utils.py:285 - litellm.acompletion(api_key='redacted', api_base='redacted', timeout=1.0,..)

with this config

model_list:
  - model_name: "o1-mini"
    litellm_params:
      model: "azure/o1-mini"
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      timeout: 1

litellm_settings:
  drop_params: true
  request_timeout: 120

Jan 11 '25 01:01 krrishdholakia

It is stacktrace for gpt-4o

Jan 11 '25 01:01 jeromeroussin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Apr 12 '25 00:04 github-actions[bot]