gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Workers silently exit when allocating more memory than the system allows.

Open jonathanlunt opened this issue 6 years ago • 5 comments

Problem Description

Gunicorn workers silently exit when allocating more memory than the system allows. This causes the master to enter an infinite loop where it keeps booting new workers unsuccessfully. No hook/logging exists to detect or handle this behavior.

[2018-12-16 23:29:45 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2018-12-16 23:29:45 +0000] [1] [INFO] Listening at: http://127.0.0.1:8080 (1)
[2018-12-16 23:29:45 +0000] [1] [INFO] Using worker: sync
[2018-12-16 23:29:45 +0000] [10] [INFO] Booting worker with pid: 10
post_fork 10
child_exit 10
[2018-12-16 23:29:46 +0000] [11] [INFO] Booting worker with pid: 11
post_fork 11
child_exit 11
[2018-12-16 23:29:47 +0000] [12] [INFO] Booting worker with pid: 12
post_fork 12
child_exit 12
[2018-12-16 23:29:47 +0000] [13] [INFO] Booting worker with pid: 13
post_fork 13
child_exit 13

Files

These file will allow you to to reproduce the behavior (clone-able version here: https://github.com/jonathanlunt/gunicorn-memory-example). docker is used to artificially constrain system resources.

app.py: Generic "hello world" from gunicorn documentation

def app(environ, start_response):
    data = b"Hello, World!\n"
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(data)))
    ])
    return iter([data])

config.py: This includes a post_fork function that allocates 100MB

bind = '%s:%s' % ('127.0.0.1', '8080')
workers = 1

def post_fork(server, worker):
    """Allocate ~100MB of integers"""
    print('post_fork', worker.pid)
    items = list(range(4300800))

def child_exit(server, worker):
    print('child_exit', worker.pid)

Dockerfile

FROM python:3.6-alpine3.8

RUN pip install gunicorn

COPY *.py /opt/
WORKDIR /opt

ENTRYPOINT ["gunicorn", "-c", "config.py", "app:app"]

Usage

The run command limits the container to allow only 50MB in memory

docker build -t gunicorn-example .
docker run -m 50mb -t gunicorn-example

Proposed Solutions

Arbiter Logging:

Since no error is logged, it is difficult to determine when this behavior is triggered other than track the number of times a worker is created. One possible way would be to log the status code in the Arbiter.

Update Arbiter.reap_workers: https://github.com/benoitc/gunicorn/blob/33025cf610bb7a6f1cb307644c1881863c2fddc4/gunicorn/arbiter.py#L524 To include:

WORKER_SUCCESS = 0
if status != WORKER_SUCCESS:
    self.log.error("Worker exited with status code: %s", status)

Worker Exit Code Tracking:

Another way to handle the error is to allow the user to perform an action depending on the process exit code. However, currently exitcode doesn't appear to be tracked by the worker class.

Update Worker.__init__: https://github.com/benoitc/gunicorn/blob/33025cf610bb7a6f1cb307644c1881863c2fddc4/gunicorn/workers/base.py#L36 To include:

self.exitcode = None

Update Arbiter.reap_workers: https://github.com/benoitc/gunicorn/blob/33025cf610bb7a6f1cb307644c1881863c2fddc4/gunicorn/arbiter.py#L534 To include:

worker.exitcode = status

The status will come up as 9 in this case. This would allow the user-provided child_exit code make a decision based on Worker.exitcode

Comments

If there are other solutions to this issue, I'd be happy to hear them, but for now I don't know if there's a good way to track/handle this situation with gunicorn by default.

I would be willing to submit a PR for the proposed solutions, but I wanted to raise this up as an issue in order to get feedback on what the best way to handle this behavior.

jonathanlunt avatar Dec 17 '18 00:12 jonathanlunt

I like both proposals.

tilgovi avatar Jan 22 '19 07:01 tilgovi

i don't think we should try to spawn indefinitely in any case anyway. We should probably handle the number of times we tried to respawn a worker in a time window and decide to stop at some point, shouldn't we?

Additionally we should indeed log the status code error. IMO it's better to crash and let the user do something at some point if it needs to restart and co.

benoitc avatar Jan 22 '19 10:01 benoitc

Hi @benoitc I would like to know if there is any mechanism by which we can prevent the worker from exiting silently. I would prefer to drop the request that is being processed by the worker rather than killing the worker (in case the worker is bootstrapping things for instance, it will be expensive to restart it).

adoukkali avatar Sep 15 '21 21:09 adoukkali

@adoukkali What do you mean by "silently" ? If something is crashing then it will crash. If you don't want to have the worker crashing, you should take measures in your applications to not trigger an exception that will make it crash.

benoitc avatar Sep 16 '21 22:09 benoitc

Any update on this? @adoukkali @benoitc What about the 2nd proposal? Worker Exit Code Tracking?

sp1rs avatar Mar 29 '22 09:03 sp1rs