gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

fd leak in thread worker causing periodic outages

Open fluffy-critter opened this issue 11 months ago • 5 comments

Hi, for the last few days I've been trying to track down an issue with a file descriptor leak in a Flask-based application and I believe I have tracked it down to gunicorn itself.

In my configuration, I have nginx as a fronting server talking to the gunicorn process over a socket file, e.g.:

server {
    listen 80;
    listen [::]:80;
    server_name example.com;

   location / {
        proxy_pass http://unix:/home/USERNAME/.vhosts/example.com:/;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

and then gunicorn is being launched with:

gunicorn --threads=12 -b unix:/home/USERNAME/.vhosts/example.com app:app

What's happening is that every now and then, my Flask application will get into a state where it's dying with a "too many open files" error, when the application itself never keeps any file descriptors open long-term. I finally managed to catch it during one of these outages, and lsof shows the gunicorn process as holding on to a bunch of open fds for the UNIX socket file that it uses to communicate with nginx (like, many hundreds of them, definitely enough to implicate this in the "too many open files" error).

I'm guessing that when gunicorn is shutting down a worker thread for whatever reason, the file descriptor for the UNIX socket is not being closed. This then eventually leads to the open descriptor table to get filled up.

I am running gunicorn 22.0.0, Python 3.12.3, and Flask 2.3.3. I will try updating to gunicorn 23.0.0 and see if that helps matters any.

fluffy-critter avatar Jan 08 '25 10:01 fluffy-critter

Update: Upgrading to gunicorn 23.0.0 didn't help, nor did switching to the gthread worker type.

fluffy-critter avatar Jan 08 '25 11:01 fluffy-critter

i am looking. note that `--threeads``imply the usage of the gthread worker.

benoitc avatar Jan 19 '25 21:01 benoitc

Thanks. In the meantime I've switched to hypercorn as it wasn't having this issue. I was unable to track down the triggering event in gunicorn.

fluffy-critter avatar Jan 19 '25 21:01 fluffy-critter

I switched back to gunicorn to get a repro, and ran an external watchdog that collates the open file handles whenever this situation arises. I have now verified that the too many open files error is coming from the parent process holding on to file handles for the UNIX socket, and not something else in my application stack.

I still haven't figured out what causes this situation to occur, though.

fluffy-critter avatar Jan 21 '25 23:01 fluffy-critter

I do not see why socket.socket(family, type, ..., dup(fd)) (via socket.fromfd) is used over socket.socket(family, type, ..., fd) anyway; do we support any platform that requires duplicating (after fork and/or exec)?

create_socket leaks one fd per socket per invocation (per arbiter re-exec) even though python can attach and detach its socket objects just fine since 3.2, and BaseSocket.__init__() should not leak in the general case but could possibly be changed to not duplicate any in the first place.

pajod avatar Apr 09 '25 06:04 pajod