gunicorn
gunicorn copied to clipboard
Client hangs when gevent worker uses long-running threads
I'm troubleshooting a deployment of a small Django app that is using Python 2.7 and Gunicorn v19.9.0 (config here) with a single gevent worker. The problem described below is not reproducible when I used the sync
class worker.
The application is implementing background jobs within the worker - a compromise until we can afford a better solution. The application has a request handler that is used to run certain long-running tasks (namely, tools executed via subprocess
that can take several minutes to complete) - the handler basically creates a new thread via threading.Thread
and returns the response as soon as possible. FWIW, when the worker is initialized the app also starts a separate thread that watches all our background jobs in a loop (uses time.sleep
to yield control). This watcher maintains a list of Thread
instances created by the application.
My understanding is that the standard library is patched and the coroutines are using cooperative code like gevent.subprocess
or gevent.sleep
, but always through the patched standard library. It's not patching explicitly, just assuming that workers/ggevent.py
does that for us.
This setup has been working in most cases. However, in one deployment where Gunicorn sits behind Nginx, the requests to this particular endpoint described above hang until the HTTP client's read timeout is exceeded. Gunicorn does not log anything suspicious:
[2018-10-26 07:24:05 +0000] [16129] [DEBUG] POST /api/v2/location/a2cb908c-6642-4083-b1f6-58e7851ffd6b/async/
[2018-10-26 07:24:05 +0000] [16129] [DEBUG] Closing connection.
I can reproduce just using cURL:
$ curl -v --request POST --header "Authorization: ApiKey test:test" --header "Content-Type: application/json" --data @request.json \
http://127.0.0.1:8000/api/v2/location/a2cb908c-6642-4083-b1f6-58e7851ffd6b/async/
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> POST /api/v2/location/a2cb908c-6642-4083-b1f6-58e7851ffd6b/async/ HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Type: application/json
> Authorization: ApiKey test:test
> Content-Length: 343
>
* upload completely sent off: 343 out of 343 bytes
< HTTP/1.1 202 ACCEPTED
< Server: nginx
< Date: Fri, 26 Oct 2018 02:00:45 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Vary: Accept, Accept-Language, Cookie
< X-Frame-Options: SAMEORIGIN
< Location: http://127.0.0.1:8000/api/v2/async/57/
< Content-Language: en
<
[******hangs********]
cURL hangs until Nginx's proxy_read_timeout is exceeded - where cURL exits with the following message: curl: (18) transfer closed with outstanding read data remaining
.
If Nginx's proxy_buffering
is enabled, cURL does not receive the response until the background job in my application completes!
However, I can't reproduce when I connect cURL to Gunicorn's web server directly - the response from the application is received immediately and cURL exits even when I know that the thread created by the handler is still running.
So I'm realizing that the threading solution that the Django app implements may not be suitable but I'd like to understand why. I don't totally get what's going on. I've confirmed that the problem I'm describing is not reproducible when the threads are not created by the worker or when they complete shortly.
Thank you in advance!
The problem disappeared when proxy_http_version 1.1
was set in Nginx.
I looked at the traffic between Nginx and Gunicorn with Wireshark but I haven't found anything interesting other than I can confirm that the response is sent right away but not after a while the last frame with no data is sent:
That's about the same amount of time that took the application thread to complete.
Thanks for reading!
Better to run long running jobs outside of the web server
the problem still not be fixed!!!