ollama-python
ollama-python copied to clipboard
504 Gateway Timeout - The server didn't respond in time
I don't know why but I'm encountering this problem with the library. Here I show my simple script:
import ollama
client = ollama.Client(host=llm_config["base_url"], timeout=600)
client.chat(model=config["ollama"]["model"], messages=[{
"role":"user",
"content":"Why is the sky blue?"
}])
Where llm_config["base_url"] is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is: qwen2.5:32b-instruct-q4_K_M and the GPU is a RTX A6000.
The traceback (client-side) is the following:
Traceback (most recent call last):
File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 236, in chat
return self._request_stream(
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 99, in _request_stream
return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 75, in _request
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
and this is what I see on the server side:
[GIN] 2024/11/07 - 22:04:21 | 500 | 50.001124922s | xx.xx.xx.xx | POST "/api/chat"
It happens everytime after 50 seconds even if the timeout is 600 seconds. Am I missing something?
I have the same issue
Hey @devilteo911 - have you tried not setting a timeout and seeing if there's an issue on the server side regardless? Trying to narrow down if some information is not passing all the way through to the server or if there is an error on the server side.
Thanks!
Hey @ParthSareen,
The issue seems to occur only on the first call, which consistently results in a 504 error. Subsequent calls with the same input perform the generation without any problems.
I believe the problem is related to the time it takes to generate the first token, particularly during a cold start of my service. During a cold start, the model needs to be downloaded from Hugging Face, as my serverless GPU provider lacks permanent storage to keep the model locally.
I hope this clarifies the issue.
I experienced the issue intermittently when I looped it like this:
$ while true ; do curl http://localhost:11434/api/chat -d '{ "model": "llama3.1:70b", "keep_alive": "0", "options": {"num_thread": 16}, "messages": [{"role": "user", "content": "Why is the sky blue?"}]}' ; sleep 1 ; done
...
...
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
...
...
Is there a way to increase the timeout? Had tried to start it like this but also still intermittently timed out 😕
# OLLAMA_LOAD_TIMEOUT=30m0s /bin/ollama serve
FYI: I tried to use "keep_alive": "0" but got an error. When I changed it to "keep_alive": 0 (notice that I removed the quotes from around the 0, the immediate errors ended, but I still got the 504. I even tried increasing the value to 360000 (I assumed thtat these are milliseconds) and I still got a 504. Is there something that we are missing here? 🤔
🤗 For anybody using nginx as their reverse proxy I found the following solution from https://stackoverflow.com/questions/43832389/what-can-i-do-to-fix-a-504-gateway-timeout-error actually solved my problem when combined with the above advice on including the "keep_alive" seemed to make the 504 vanish for me. 🤗
The following is from the Stack Overflow post:
For those experiencing this error, that have access to their app / site's hosting environment, which is proxying through NGINX, this issue can be fixed by extending the timeout for API requests.
In your
/etc/nginx/sites-available/defaultor/etc/nginx/nginx.confadd the following variables:proxy_connect_timeout 240; proxy_send_timeout 240; proxy_read_timeout 240; send_timeout 240;Run
sudo nginx -tto check the syntax, and thensudo service nginx restart.This should effectively quadruple the time before NGINX will timeout your API requests (the default being 60 seconds, our new timeout being 240 seconds).
I have arrived here after already changing nginx timeouts. Using a containerised app in k8s, I have increased ingress timouts, gunicorn worker timeouts, and the gninx proxy timeouts all to 600 seconds. I still see error 504 gateway timeout , and ollama appears to be the only place left.
Is there any value controlling timeout on ollama serve API calls?