gunicorn Gunicorn spiking memory consumption regularly after 3 weeks

I'm running several Gunicorn applications in Kubernetes with memory resource restrictions. I'm experiencing unexpected OOM failures regularly, monitoring shows a very steep memory peak after 3 weeks uptime of the container. Typically, the app will use less than 200MB, and is restricted to 250MB, but the peak will exceed the limit, inducing OOM and container restart. Gunicorn (20.1.0, all Python from Debian Bookworm) is configured preload_app=True and workers=2.

This very much looks like Gunicorn starts two fresh workers after three weeks, to replace "worn out" workers. I couldn't find any mention if this is the case and how to configure it, or any other reason for that memory spike.

Aug 15 '24 13:08 andreas-p

Try to correlate: What else are you doing with that regularity (underlying system updates, cleanup of old logs, database maintenance, ..)? Also, monitor opened files and connections and response times, your app may be merely consuming excessive resources in response to outside factors, e.g. if at the same time the database is slow to answer, increasing the number of unfinished requests requiring memory simultaneously.

Look at this related issue also discussing a setup with priority on reducing memory footprint: https://github.com/benoitc/gunicorn/issues/3251

Aug 15 '24 16:08 pajod

I'm observing this "spike every n weeks" on different servers, all spikes happening EXACTLY several weeks after start (same minute). Some of them had definitively no client access at that time (except for /alive endpoint, returning "ok" for kubernetes) There's no system correlation, only the very exact timing.

Aug 15 '24 16:08 andreas-p

If the effect you are seeing was triggered by log rotation (which would be commonly configured to happen after some regular interval), you could confirm that from open file descriptors, or timestamps on the output files.

Aug 15 '24 16:08 pajod

There's no log rotation. The container log to stdout/stderr.

Aug 15 '24 17:08 andreas-p

I am facing similar issue where worker is getting timed out while booting and leads to core dumps causing oom and container restarts. I checked my system and there is no operations which takes more time to load I tested with timeout increament also. Using gunicorn==21.2.0, fastapi==0.68.0, uvicorn-worker==0.2.0, python:3.8.6. preload_app=True and workers=3, thread=3

Sep 05 '24 10:09 AafreenAshrafi