uwsgi icon indicating copy to clipboard operation
uwsgi copied to clipboard

High memory usage if `fs.nr_open` is very high and no `ulimit` set on Linux systems

Open RedRoserade opened this issue 3 years ago • 8 comments

While debugging https://github.com/kubernetes-sigs/kind/issues/2175, I tried to understand why uwsgi wasn't running well on a Kind cluster on Fedora 33.

I came to the conclusion that it is because of a too high value for fs.nr_open, which defaults to 1073741816 on Fedora 33, but only 1048576 on Ubuntu 20.10. The very high limit causes, on my machine, the uWSGI process on a pod to consume >8Gi of memory on the --http process, and if memory limits are set, the process will get OOM-killed by the kernel (please see issue above for a test repo and logs).

The issue isn't manifested when running uwsgi outside a container/pod because of per-user limits set with ulimit of 1024. Also the containerd.service unit seems to, by default, set a value for fs.nr_open of 1048576, which helps avoid this issue when the container with uwsgi is run via docker run.

Pod logs (high limit set deliberately via sysctl -w fs.nr_open=1073741816):

red@noctis:~/Development/kube-stuff$ kubectl logs example-pod
*** Starting uWSGI 2.0.19.1 (64bit) on [Fri Apr  2 18:05:18 2021] ***
compiled with version: 8.3.0 on 02 April 2021 17:51:03
os: Linux-5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021
nodename: example-pod
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /
detected binary path: /usr/local/bin/uwsgi
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
(continues, and then hangs. The `--http` process is OOM-killed)

Raising the limit on an Ubuntu 20.10 machine to 1073741816 and trying again, without a container:

(Note that I had to do it via sysctl -w and ulimit -n to raise both limits, it seems Ubuntu has a per-user limit set of 1024)

noctis# ulimit -n 1073741816
noctis# ulimit -n           
1073741816
noctis# source venv/bin/activate
(venv) noctis# ./docker-entrypoint.sh 
*** Starting uWSGI 2.0.19.1 (64bit) on [Fri Apr  2 19:17:18 2021] ***
compiled with version: 10.2.0 on 02 April 2021 18:03:34
os: Linux-5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021
nodename: noctis
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/red/Development/kube-stuff
detected binary path: /home/red/Development/kube-stuff/venv/bin/uwsgi
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 62286
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
(doesn't hang, but consumes 8193M of memory)

image

Lowering the value of fs.nr_open to 1048576 makes things work well on the pod. However, I wonder why the uwsgi process consumes so much memory when this limit is high.

Running the same uwsgi app without changing any limits, and without containers:

red@noctis:~/Development/kube-stuff$ sysctl fs.nr_open
fs.nr_open = 1048576
red@noctis:~/Development/kube-stuff$ ulimit -n
1024
red@noctis:~/Development/kube-stuff$ source venv/bin/activate
(venv) red@noctis:~/Development/kube-stuff$ ./docker-entrypoint.sh 
*** Starting uWSGI 2.0.19.1 (64bit) on [Fri Apr  2 19:48:49 2021] ***
compiled with version: 10.2.0 on 02 April 2021 18:03:34
os: Linux-5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021
nodename: noctis
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/red/Development/kube-stuff
detected binary path: /home/red/Development/kube-stuff/venv/bin/uwsgi
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 62286
your memory page size is 4096 bytes
detected max file descriptor number: 1024

Memory usage is normal in this case.

Finally, I notice that if I run --http-socket instead of --http, memory usage is what I would consider "normal" (a few hundred MiB at most), but these options are not equivalent according to the documentation.

RedRoserade avatar Apr 02 '21 18:04 RedRoserade

Does it have the same effect passing the lower count with --max-fd?

xrmx avatar Apr 03 '21 08:04 xrmx

@xrmx It does not, I wasn't aware of that option. I tested it with --max-fd 1024 and it no longer consumes a huge amount of memory, even when running as non-root (despite what https://uwsgi-docs.readthedocs.io/en/latest/Options.html#max-fd says that it requires root privileges).

Probably the 1024 limit is a bit too low, but it works.

RedRoserade avatar Apr 03 '21 09:04 RedRoserade

Yeah, the root problem is that some data structures are as a big as the number of fds available so the ill effect you have seen.

xrmx avatar Apr 03 '21 09:04 xrmx

I see, thanks for the explanation. Assuming the data structures cannot be changed, I wonder if a default limit of something like 1048576 would be sane. However, at the same time, setting such defaults could break some use cases.

RedRoserade avatar Apr 03 '21 09:04 RedRoserade

Just chiming in to share here since it was a result on the first page of a search query. Should help with visibility :+1:

This will likely be due to a config on your system for the container runtime (dockerd.service, containerd.service, etc) that sets LimitNOFILE=infinity.

Typically infinity will be approx 2^30 (over 1 billion) in size, while some distros like Debian (and Ubuntu deriving from it) have a lower 2^20 limit (1k times less) which is the default sysctl fs.nr_open value.

This was due to systemd v240 (2018Q4) release that would raise fs.nr_open and fs.file-max to the highest possible value, and fs.nr_open being used as infinity IIRC.

  • On some distros like Fedora, at least outside of containers this was a non-issue as infinity was not used (systemd 240 kept the soft limit of 1024, but raised the hard limit to 512k vs the kernels default 4096).
  • On others like Debian, pam_limits.so has been carrying a patch for something like 2 decades that set infinity as the hard limit IIRC (or it just took whatever the hard limit was on PID 1, which would be fs.nr_open, same outcome AFAIK). That caused the v240 release to not play well due to 2^20 being raised to 2^30, so they build systemd without the fs.nr_open bump (instead of fixing patch for pam_limits.so :man_shrugging: ).

Anyway... for container runtimes with systemd, they'd configure LimitNOFILE and have bounced between 2^20 (1048576) and 2^30 (infinity) a few times, with infinity being present since 2018-2021 depending on what you installed (and when your distro got the update). That is what raised the limits in the container, that may not appear to be the same on your host.

Often you can configure the ulimit per container (eg: docker run has --ulimit, compose and k8s have similar ulimit config settings). Or you can set the LimitNOFILE for the systemd service config to a sane value.... or if you're lucky I guess like in this case your software affected has an option to impose a limit.

Just to clarify, this typically only affects the soft limit value, although some software internally raises the soft limit to the hard limit (perfectly acceptable... just 2^30 is not a sane hard limit, 2^19 is often plenty and many can get away with 2^16 just fine).


As for the memory usage, from what I've read in other software affected (Java), an array is allocated sized to the soft limit set, and that used 8 bytes per element, thus for 2^30 uses approx 8.6GB of memory. The more sane 2^20 hard limit you'd see on Debian would only use 8.4MB in comparison, and likewise if the default soft limit 1024 would be fine, only 8.2KB needed.


For dockerd and containerd, this problem is likely to be resolved this year as there is a fair amount of discussion going on to not use infinity.

polarathene avatar Mar 08 '23 02:03 polarathene

Just adding some more visibility here. This still bit me on Fedora 39's docker. For people running into this issue:

  • If not essential (exposing uWSGI to the world), not using the --http option, removes the issue
  • As mentioned above, lowering the ulimit works, either in systemd or (for a faster fix) in your bootstrap script, ulimit -n 1048576 should do the trick!

jualvarez avatar Mar 30 '24 14:03 jualvarez

https://access.redhat.com/solutions/1479623

https://github.com/systemd/systemd/commit/a8b627aaed409a15260c25988970c795bf963812

mchtech avatar Jul 03 '24 04:07 mchtech

https://access.redhat.com/solutions/1479623 systemd/systemd@a8b627a

I already described the cause above with:

Typically infinity will be approx 2^30 (over 1 billion) in size, while some distros like Debian (and Ubuntu deriving from it) have a lower 2^20 limit (1k times less) which is the default sysctl fs.nr_open value.

This was due to systemd v240 (2018Q4) release that would raise fs.nr_open and fs.file-max to the highest possible value, and fs.nr_open being used as infinity IIRC.

Is the intent of your links to provide additional reference / context?

polarathene avatar Jul 04 '24 04:07 polarathene