[Bug]: "timeout" and "stream_timeout" set at the model level in config.yaml do not work
What happened?
I am setting both "timeout" and "stream_timeout" in my config.yaml like below.
- model_name: "gpt-4o"
litellm_params:
model: "azure/gpt-4o"
api_key: "os.environ/AZURE_API_KEY_EU2"
api_base: "os.environ/AZURE_API_BASE_EU2"
api_version: "os.environ/AZURE_API_VERSION"
timeout: 300
stream_timeout: 120
tpm: 5000000
tags: ["East US 2"]
model_info:
mode: "chat"
base_model: "azure/gpt-4o"
<truncated>
I do not set request_timeout under litellm_settings:
litellm_settings:
num_retries: 0
callbacks: callback.handler
drop_params: true
# request_timeout: 120
# set_verbose: true
What I am observing is that some requests (likely hung for a reason or another) do not get timed out until they reach exactly 6000s, which is the default for request_timeout: https://github.com/BerriAI/litellm/blob/04238cd9a97e802b2637924b8eed46c9012878c6/litellm/init.py#L297
I therefore question whether timeout and stream_timeout really do what they are supposed to do: https://docs.litellm.ai/docs/proxy/reliability#custom-timeouts-stream-timeouts---per-model
Relevant log output
No response
Are you a ML Ops Team?
Yes
What LiteLLM version are you on ?
v1.53.1
Twitter / LinkedIn details
https://www.linkedin.com/in/jeromeroussin/
Here is the timeout stacktrace for one of those 6000s (non-streaming) timeouts if that helps:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
yield
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
resp = await self._pool.handle_async_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
raise exc from None
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
response = await connection.handle_async_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
return await self._connection.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
raise exc
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request
) = await self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers
event = await self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
data = await self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
with map_exceptions(exc_map):
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1576, in _request
response = await self._client.send(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1631, in send
response = await self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1659, in _send_handling_auth
response = await self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1696, in _send_handling_redirects
response = await self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1732, in _send_single_request
response = await transport.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
with map_httpcore_exceptions():
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 616, in acompletion
headers, response = await self.make_azure_openai_chat_completion_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 349, in make_azure_openai_chat_completion_request
raise e
File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 341, in make_azure_openai_chat_completion_request
raw_response = await azure_client.chat.completions.with_raw_response.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_legacy_response.py", line 373, in wrapped
return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1661, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1843, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1537, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 481, in acompletion
response = await init_response
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litellm/llms/AzureOpenAI/azure.py", line 667, in acompletion
raise AzureOpenAIError(status_code=500, message=str(e))
litellm.llms.AzureOpenAI.azure.AzureOpenAIError: Request timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/litellm/utils.py", line 1031, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litellm/main.py", line 503, in acompletion
raise exception_type(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
raise e
File "/usr/local/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 194, in exception_type
raise Timeout(
litellm.exceptions.Timeout: litellm.Timeout: APITimeoutError - Request timed out.
error_str: Request timed out.
Thinking through this:
- num_retries for the deployment would be passed into the
.completion()function - at the router, for async_function_with_retries -> this the num retries across the model group,
so if a user is setting a num retry on a specific model in a list,
either it means retry on this specific model X times (this would retry that specific model X times and if that fails retry across the model group Y times - where Y is the global litellm.num_retries setting) => that doesn't sound good, as i don't think as a user i'm expecting it to multiply my num retries (X * Y) for these requests
or retry requests which route to this model X times (this would pass the num retries for this deployment into the kwargs used to track the amount we should run retries on this request)
ignore my comments - i was writing to address num retries, which has a similar issue.
I can see the timeout being correctly passed for anthropic. i wonder if this is azure specific. testing now
i see the timeout being correctly passed to azure as well
Logs
This is not solved for Aider v0.72.3 when accessing the llama-server endpoint.
aider ignores --timeout flag and times out the requests and then attempts to cancel the request, which of course does not work:
aider --model openai/deepseek-r1 --timeout 60000
litellm.APIConnectionError: APIConnectionError: OpenAIException - timed out
Retrying in 0.2 seconds...
litellm.APIConnectionError: APIConnectionError: OpenAIException - peer closed connection without sending complete message
body (incomplete chunked read)
Retrying in 0.5 seconds...
litellm.APIError: APIError: OpenAIException - Connection error.
Retrying in 1.0 seconds...
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 42016, n_keep = 0, n_prompt_tokens = 2074
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.987464
srv cancel_tasks: cancel task, id_task = 0
request: POST /chat/completions 127.0.0.1 200
srv cancel_tasks: cancel task, id_task = 2
request: POST /chat/completions 127.0.0.1 200
...
Confirmed by user this is fixed in v1.63.12
any idea how this was fixed? getting mysterious timeouts -> retries with an oai-compatible server using aider as well
the issue seemed to be that the timeout in aider benchmarking was too long: 24 * 60 * 60. its possible this caused some kind of overflow resulting in extremely short timeouts. https://github.com/Aider-AI/aider/blob/main/benchmark/benchmark.py#L348
setting this to 60 * 60 resolved the short timeouts
Hey @justinchiu-cohere what do your timeout messages say?
We emit the timeout value set, and how long the request ran for, in the error message
Hey! The timeout messages interestingly did not return those values, so I couldn't find where they were in the litellm codebase.
Messages:
litellm.Timeout: Timeout Error: OpenAIException - stream timeout
The API provider timed out without return a response. They may be down or overloaded.
With retries from .2 -> .5 -> ...
Outside of aider's polyglot benchmark the responses took around 3 min to respond, and setting the timeout from 24 hours -> 1 hour resolved this timeout error.
The timeout messages are here for openai exceptions - https://github.com/BerriAI/litellm/blob/5007ef868f0e6ef5e07bb5ded0e8a260c26a726e/litellm/llms/openai/openai.py#L433
oh is that a sync request? @justinchiu-cohere
i see we're missing it on the sync route - https://github.com/BerriAI/litellm/blob/5007ef868f0e6ef5e07bb5ded0e8a260c26a726e/litellm/llms/openai/openai.py#L439
Will push a fix with it if that's what aider is using in their test
Yes, it looks like aider uses the sync completions in general: https://github.com/Aider-AI/aider/blob/main/aider/models.py#L982