gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Gunicorn freezing periodically

Open roshanpati opened this issue 6 years ago • 25 comments

gunicorn version 19.9.0 Gunicorn is freezing periodically. The server keeps running ,but no request gets routed. Anyone who has faced this and resolved it?

roshanpati avatar Nov 23 '18 14:11 roshanpati

Are you able to provide any other details such as anything interesting in logs, a reproducible setup or problem requests?

Not much to go on here yet.

javabrett avatar Nov 23 '18 22:11 javabrett

ping @Roskgp96

benoitc avatar Nov 27 '18 09:11 benoitc

+1. I faced the issue. I can add that setting --threads 2 or more resolves the issue. No hints with --log-level debug. Judging by logs it just stops processing for some time, then resumes again.

Sermus avatar Dec 24 '18 04:12 Sermus

@Sermus what does mean "freezing" in your case? Do you have any logs from gunicorn? What does your application?

benoitc avatar Dec 24 '18 05:12 benoitc

A guess: application or some dependent packages are sending http requests to itself synchronously

ifduyue avatar Dec 24 '18 05:12 ifduyue

If you can provide more information for debugging or a case to reproduce the problem, please open a ticket with details. Thanks!

tilgovi avatar Jan 22 '19 08:01 tilgovi

To me it's happening the same issue. I'm going to describe a bit my scenario.

I'm stressing a Gunicorn server within a docker container with the stress testing tool K6 (https://k6.io/) with 64 virtual users for 10 seconds. I tested that the app is working ok (the only thing that is retrieving from the app is a hardcoded string with a specific length).

What I did was the following:

  1. Run Gunicorn container with Django. Gunicorn Conf:
worker_class = os.environ['WORKER_CLASS']
workers = os.environ['NUM_WORKERS']
backlog = os.environ['BACKLOG']
timeout = os.environ['TIMEOUT']
accesslog = '/accesslog.log'
errorlog = '/errorlog.log'

Docker ENV variables:

- WORKER_CLASS=sync
- NUM_WORKERS=5
- BACKLOG=4096
- TIMEOUT=3600
  1. Open in a new terminal and run "docker stats"
  2. Run "docker exec -it "containerID" /bin/sh and access /accesslog.log and /errorlog.log
  3. When stress testing the Gunicorn container, docker stats showed me that CPU % usage was 100% (all working as expected), but suddenly, CPU % usage dropped to 0%. After 6 seconds of Gunicorn's drop, CPU usage starts to grow up to 100%.
  4. While running this test, I did a tail -f /accesslog.log and tail -f /errorlog.log. Within errorlog.log, the only thing that logged was the following:

errorlog.log

[2019-01-25 18:48:47 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-01-25 18:48:47 +0000] [1] [INFO] Listening at: http://0.0.0.0:8888 (1)
[2019-01-25 18:48:47 +0000] [1] [INFO] Using worker: sync
[2019-01-25 18:48:47 +0000] [8] [INFO] Booting worker with pid: 8
[2019-01-25 18:48:47 +0000] [9] [INFO] Booting worker with pid: 9
[2019-01-25 18:48:47 +0000] [11] [INFO] Booting worker with pid: 11
[2019-01-25 18:48:47 +0000] [13] [INFO] Booting worker with pid: 13
[2019-01-25 18:48:47 +0000] [14] [INFO] Booting worker with pid: 14
  1. The interesting thing is that, when Gunicorn dropped, Gunicorn didn't processed any request at all. I say this because there was a moment that Gunicorn didn't log to /accesslog.log. Example of accesslog.log when Gunicorn dropped:
172.19.0.1 - - [25/Jan/2019:18:58:52 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
172.19.0.1 - - [25/Jan/2019:18:58:52 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
172.19.0.1 - - [25/Jan/2019:18:58:58 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"
172.19.0.1 - - [25/Jan/2019:18:58:58 +0000] "GET /somewhere HTTP/1.1" 200 10 "-" "k6/0.23.1 (https://k6.io/)"

Check the second and third log that I pasted. These is the most interesting part: Gunicorn stopped processing requests and didn't show any sign that workers restarted or anything similar.

Any clue of what is happening?

joater avatar Jan 25 '19 19:01 joater

That is very strange. I'll re-open the issue. Please continue investigating if you can.

tilgovi avatar Jan 25 '19 20:01 tilgovi

If you can publish a repository with setup instructions to reproduce this that would be the best.

tilgovi avatar Jan 25 '19 20:01 tilgovi

Hi @tilgovi , thanks for your response!

This is the repository: https://github.com/joater/gunicorn-freeze

Inside README.md you can find the steps to follow for reproducing the bug. I tested it in my notebook.

Any comments or doubts, please tell me.

joater avatar Jan 26 '19 14:01 joater

do we have any clue about what was happening?

benoitc avatar Nov 22 '19 17:11 benoitc

I have not yet had a chance to look.

tilgovi avatar Nov 23 '19 23:11 tilgovi

i'm facing the same issue when stress testing my flask stack (flask apps served by gunicorn behind nginx and shipped with docker) . any ideas ?

AmineBenatmane avatar Feb 15 '20 15:02 AmineBenatmane

Any update on this issue ? I am using flask server and gunicorn as wsgi.Gunicorn is not accepting any requests becoming idle after sometime and after restart it is working fine.

RaviTejaMaddhini avatar May 27 '20 06:05 RaviTejaMaddhini

Same here, freezes for about 20 seconds after the amount of requests equals the amount workers. "--threads" seems to do nothing.
I'm using default and latest CTFd, which is running on Python 2, could that be related?

tijldeneut avatar May 29 '20 10:05 tijldeneut

Weirdly gunicorn is throwing a 500 after being idle for some time

RaviTejaMaddhini avatar May 29 '20 11:05 RaviTejaMaddhini

+1. I faced this issue too. Gunicorn freezes in the middle of processing the request, and when tries to stop the worker, the request is processed successfully to the end, and returns 200.

configuration:

workers = 1 threads = 20 worker_class = 'sync' worker_connections = 1000 timeout = 3000 # I need it because it can process long, heavy requests. keepalive = 2

I am sure that the request is the only one on the server (except for collecting Prometheus metrics every 15 seconds, which, by the way, are successfully collected).

Cactiw avatar Jun 04 '20 15:06 Cactiw

+1. I faced this issue too. Gunicorn freezes in the middle of processing the request, and when tries to stop the worker, the request is processed successfully to the end, and returns 200.

configuration:

workers = 1 threads = 20 worker_class = 'sync' worker_connections = 1000 timeout = 3000 # I need it because it can process long, heavy requests. keepalive = 2

I am sure that the request is the only one on the server (except for collecting Prometheus metrics every 15 seconds, which, by the way, are successfully collected).

Sorry, my bad. The problem was using multiprocessing.Pool when processing the request. It seems that gunicorn is not compatible with this module.

import multiprocessing

...

def process_request():
    with multiprocessing.Pool(MAX_PROC_PER_CLIENT_NUM) as pool:
        for result in pool.imap(work, some_list):
            --- logic here ---
        
        logging.debug("All results received and processed")
    logging.debug("Pool successfully closed")

This code sometimes (about 30 percent of time) printed the following: "All results received and processed"

And hung in waiting, leaving it only when the worker received the shutdown signal (returning the code 200!)

Replacing

import multiprocessing

with

import multiprocessing.dummy as multiprocessing

solved the issue. However, this is a potential performance drop.

Related link: https://stackoverflow.com/questions/29175090/using-pythons-multiprocessing-makes-response-hang-on-gunicorn

Cactiw avatar Jun 04 '20 16:06 Cactiw

For what it's worth, I've also observed what appears to be the same issue.

It occurs for me with the combination of gevent.monkey.patch_all() and multiprocessing.ThreadPool.imap_unordered. Without either, it's fine, but with both, gunicorn workers will eventually get in a wedged state until all workers are unproductive and need to be restarted. Using --timeout set to a positive integer prevents the workers from being wedged indefinitely, but that doesn't help the requests they block on.

My work-around is to use gevent.threadpool.ThreadPool instead of multiprocessing.ThreadPool.

smilliken avatar Jul 14 '20 20:07 smilliken

gevent has limited support for multiprocessing. Anything using multiprocessing.Queue for example, will hang. Raw use of multiprocessing.Process is about all that's supported.

jamadden avatar Jul 14 '20 21:07 jamadden

gevent has limited support for multiprocessing. Anything using multiprocessing.Queue for example, will hang. Raw use of multiprocessing.Process is about all that's supported.

Agreed! Echoes my experience exactly.

Noting in case it helps someone: gevent's event loop needs to be reinitialized in new processes and gevent has a function for this: http://www.gevent.org/api/gevent.html#gevent.reinit

smilliken avatar Jul 14 '20 21:07 smilliken

gevent monkey-patches fork, and I thought (i'm the gevent maintainer) multiprocessing.Process to automatically re-init in child processes; there's also loop-level functions that watch for forks (IIRC, been awhile since I looked at that code). So it generally shouldn't be necessary to call that manually. I'd be interested in gevent bug reports in cases where it is.

jamadden avatar Jul 14 '20 21:07 jamadden

i'm the gevent maintainer

Oh hey, thank you for your work!! Gevent is awesome, I don't know what I'd do without it. I'm in your debt.

So it generally shouldn't be necessary to call that manually. I'd be interested in gevent bug reports in cases where it is.

I've had no issue that required calling it manually (except as a sanity check while debugging).

smilliken avatar Jul 14 '20 22:07 smilliken

+1 on this issue.

My application is quite simple. No multiprocessing whatsoever.

I'm running Flask in a Docker container on a Raspberry Pi Zero W. I have another container reading sensors on a loop and dumping data to a SQLite DB and the affected container responds to incoming requests by reading the DB and returning requested data.

Logs provide no information whatsoever:

[2021-02-26 03:02:22 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:23 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:23 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:24 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:24 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:25 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:26 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:26 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:27 +0000] [1] [INFO] Handling signal: ttou
[2021-02-26 03:02:27 +0000] [1] [INFO] Handling signal: ttou

This eventually just stops. If I follow the logs with docker logs -f <my-container> and make a request, nothing happens in the logs, and the client just hangs indefinitely.

This seems to happen after ~12 hours of running.

EDIT: My issue has been solved (for now :) by using the --preload option. My application, while simple, loads slowly due to the constraints of the low-power device running multiple containers.

vjp23 avatar Feb 26 '21 03:02 vjp23

Still seeing this issue. Same as above, we're using a combination of gunicorn and multiprocessing.Pool.

carlosdagos avatar Jul 19 '22 01:07 carlosdagos

You should ensure that multiprocessus don't try to become the maste of the process it is running. Which is the issue.

@vjp23 please open a separate issue for your problem if you still need it. I will close this old one.

benoitc avatar Oct 14 '22 10:10 benoitc