gunicorn
gunicorn copied to clipboard
Gunicorn freezing periodically
gunicorn version 19.9.0 Gunicorn is freezing periodically. The server keeps running ,but no request gets routed. Anyone who has faced this and resolved it?
Are you able to provide any other details such as anything interesting in logs, a reproducible setup or problem requests?
Not much to go on here yet.
ping @Roskgp96
+1. I faced the issue. I can add that setting --threads 2 or more resolves the issue. No hints with --log-level debug. Judging by logs it just stops processing for some time, then resumes again.
@Sermus what does mean "freezing" in your case? Do you have any logs from gunicorn? What does your application?
A guess: application or some dependent packages are sending http requests to itself synchronously
If you can provide more information for debugging or a case to reproduce the problem, please open a ticket with details. Thanks!
To me it's happening the same issue. I'm going to describe a bit my scenario.
I'm stressing a Gunicorn server within a docker container with the stress testing tool K6 (https://k6.io/) with 64 virtual users for 10 seconds. I tested that the app is working ok (the only thing that is retrieving from the app is a hardcoded string with a specific length).
What I did was the following:
- Run Gunicorn container with Django. Gunicorn Conf:
worker_class = os.environ['WORKER_CLASS']
workers = os.environ['NUM_WORKERS']
backlog = os.environ['BACKLOG']
timeout = os.environ['TIMEOUT']
accesslog = '/accesslog.log'
errorlog = '/errorlog.log'
Docker ENV variables:
- WORKER_CLASS=sync
- NUM_WORKERS=5
- BACKLOG=4096
- TIMEOUT=3600
- Open in a new terminal and run "docker stats"
- Run "docker exec -it "containerID" /bin/sh and access /accesslog.log and /errorlog.log
- When stress testing the Gunicorn container, docker stats showed me that CPU % usage was 100% (all working as expected), but suddenly, CPU % usage dropped to 0%. After 6 seconds of Gunicorn's drop, CPU usage starts to grow up to 100%.
- While running this test, I did a tail -f /accesslog.log and tail -f /errorlog.log. Within errorlog.log, the only thing that logged was the following:
errorlog.log
[2019-01-25 18:48:47 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-01-25 18:48:47 +0000] [1] [INFO] Listening at: http://0.0.0.0:8888 (1)
[2019-01-25 18:48:47 +0000] [1] [INFO] Using worker: sync
[2019-01-25 18:48:47 +0000] [8] [INFO] Booting worker with pid: 8
[2019-01-25 18:48:47 +0000] [9] [INFO] Booting worker with pid: 9
[2019-01-25 18:48:47 +0000] [11] [INFO] Booting worker with pid: 11
[2019-01-25 18:48:47 +0000] [13] [INFO] Booting worker with pid: 13
[2019-01-25 18:48:47 +0000] [14] [INFO] Booting worker with pid: 14
- The interesting thing is that, when Gunicorn dropped, Gunicorn didn't processed any request at all. I say this because there was a moment that Gunicorn didn't log to /accesslog.log. Example of accesslog.log when Gunicorn dropped:
172.19.0.1 - - [25/Jan/2019:18:58:52 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
172.19.0.1 - - [25/Jan/2019:18:58:52 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
172.19.0.1 - - [25/Jan/2019:18:58:58 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
172.19.0.1 - - [25/Jan/2019:18:58:58 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
Check the second and third log that I pasted. These is the most interesting part: Gunicorn stopped processing requests and didn't show any sign that workers restarted or anything similar.
Any clue of what is happening?
That is very strange. I'll re-open the issue. Please continue investigating if you can.
If you can publish a repository with setup instructions to reproduce this that would be the best.
Hi @tilgovi , thanks for your response!
This is the repository: https://github.com/joater/gunicorn-freeze
Inside README.md you can find the steps to follow for reproducing the bug. I tested it in my notebook.
Any comments or doubts, please tell me.
do we have any clue about what was happening?
I have not yet had a chance to look.
i'm facing the same issue when stress testing my flask stack (flask apps served by gunicorn behind nginx and shipped with docker) . any ideas ?
Any update on this issue ? I am using flask server and gunicorn as wsgi.Gunicorn is not accepting any requests becoming idle after sometime and after restart it is working fine.
Same here, freezes for about 20 seconds after the amount of requests equals the amount workers. "--threads" seems to do nothing.
I'm using default and latest CTFd, which is running on Python 2, could that be related?
Weirdly gunicorn is throwing a 500 after being idle for some time
+1. I faced this issue too. Gunicorn freezes in the middle of processing the request, and when tries to stop the worker, the request is processed successfully to the end, and returns 200.
configuration:
workers = 1 threads = 20 worker_class = 'sync' worker_connections = 1000 timeout = 3000 # I need it because it can process long, heavy requests. keepalive = 2
I am sure that the request is the only one on the server (except for collecting Prometheus metrics every 15 seconds, which, by the way, are successfully collected).
+1. I faced this issue too. Gunicorn freezes in the middle of processing the request, and when tries to stop the worker, the request is processed successfully to the end, and returns 200.
configuration:
workers = 1 threads = 20 worker_class = 'sync' worker_connections = 1000 timeout = 3000 # I need it because it can process long, heavy requests. keepalive = 2
I am sure that the request is the only one on the server (except for collecting Prometheus metrics every 15 seconds, which, by the way, are successfully collected).
Sorry, my bad. The problem was using multiprocessing.Pool when processing the request. It seems that gunicorn is not compatible with this module.
import multiprocessing
...
def process_request():
with multiprocessing.Pool(MAX_PROC_PER_CLIENT_NUM) as pool:
for result in pool.imap(work, some_list):
--- logic here ---
logging.debug("All results received and processed")
logging.debug("Pool successfully closed")
This code sometimes (about 30 percent of time) printed the following: "All results received and processed"
And hung in waiting, leaving it only when the worker received the shutdown signal (returning the code 200!)
Replacing
import multiprocessing
with
import multiprocessing.dummy as multiprocessing
solved the issue. However, this is a potential performance drop.
Related link: https://stackoverflow.com/questions/29175090/using-pythons-multiprocessing-makes-response-hang-on-gunicorn
For what it's worth, I've also observed what appears to be the same issue.
It occurs for me with the combination of gevent.monkey.patch_all()
and multiprocessing.ThreadPool.imap_unordered
. Without either, it's fine, but with both, gunicorn workers will eventually get in a wedged state until all workers are unproductive and need to be restarted. Using --timeout
set to a positive integer prevents the workers from being wedged indefinitely, but that doesn't help the requests they block on.
My work-around is to use gevent.threadpool.ThreadPool
instead of multiprocessing.ThreadPool
.
gevent has limited support for multiprocessing. Anything using multiprocessing.Queue for example, will hang. Raw use of multiprocessing.Process
is about all that's supported.
gevent has limited support for multiprocessing. Anything using multiprocessing.Queue for example, will hang. Raw use of
multiprocessing.Process
is about all that's supported.
Agreed! Echoes my experience exactly.
Noting in case it helps someone: gevent's event loop needs to be reinitialized in new processes and gevent has a function for this: http://www.gevent.org/api/gevent.html#gevent.reinit
gevent monkey-patches fork
, and I thought (i'm the gevent maintainer) multiprocessing.Process
to automatically re-init in child processes; there's also loop-level functions that watch for forks (IIRC, been awhile since I looked at that code). So it generally shouldn't be necessary to call that manually. I'd be interested in gevent bug reports in cases where it is.
i'm the gevent maintainer
Oh hey, thank you for your work!! Gevent is awesome, I don't know what I'd do without it. I'm in your debt.
So it generally shouldn't be necessary to call that manually. I'd be interested in gevent bug reports in cases where it is.
I've had no issue that required calling it manually (except as a sanity check while debugging).
+1 on this issue.
My application is quite simple. No multiprocessing whatsoever.
I'm running Flask in a Docker container on a Raspberry Pi Zero W. I have another container reading sensors on a loop and dumping data to a SQLite DB and the affected container responds to incoming requests by reading the DB and returning requested data.
Logs provide no information whatsoever:
[2021-02-26 03:02:22 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:23 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:23 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:24 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:24 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:25 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:26 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:26 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:27 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:27 +0000] [1] [INFO] Handling signal: ttou
This eventually just stops. If I follow the logs with docker logs -f <my-container>
and make a request, nothing happens in the logs, and the client just hangs indefinitely.
This seems to happen after ~12 hours of running.
EDIT: My issue has been solved (for now :) by using the --preload
option. My application, while simple, loads slowly due to the constraints of the low-power device running multiple containers.
Still seeing this issue. Same as above, we're using a combination of gunicorn
and multiprocessing.Pool
.
You should ensure that multiprocessus don't try to become the maste of the process it is running. Which is the issue.
@vjp23 please open a separate issue for your problem if you still need it. I will close this old one.