containerd-shim processes are leaking inotify instances with cgroups v2
This is a duplicate of https://github.com/containerd/containerd/issues/5670
But I wanted to raise an issue with Flatcar anyway:
- For visibility to other FC users who might run into it
- Perhaps other users don't see this issue or have found a workaround
Since 2983.2.0 defaults to cgroupsv2, we saw this issue frequent enough where we had to roll back.
Client application proc might log something like this:
failed to create fsnotify watcher: too many open files
You can lessen the issue by increasing the default: fs.inotify.max_user_instances=8192, but sooner or later nodes still run out..
Hi,
thanks for raising this here. One first question I have for a workaround is whether the shim is really required because I read a comment that it was only needed for live restore. Would be good to try setting no_shim = true following https://www.flatcar-linux.org/docs/latest/container-runtimes/customizing-docker/#use-a-custom-containerd-configuration
I'm going to try this now, but this is interesting: https://github.com/flatcar-linux/coreos-overlay/blob/main/app-emulation/containerd/files/config.toml#L26-L28
The comment suggests not running with a shim, but the setting is default false value.
no_shim = false means it uses the shim ;) - but yes, the comment above is confusing.
I think you would change it to true under [plugins."containerd.runtime.v1.linux"] but maybe it's worth to check if there are possible sections in your config dump output (I don't have a Phd in containerd config.toml-ology, don't trust what I say).
I guess the comment wording (also in https://github.com/containerd/containerd/blob/main/docs/ops.md#linux-runtime-plugin) was done that way to match the config name and what is does when enabled, not the value false that is set…
I'm going to try and removing shim and report back :+1:
I'm going to try and removing shim and report back 👍
Any update on this? We are experiencing the same issue and curious to know if removing the shim is a viable option
Sorry - not yet.
I'm deploying a count metric for inotify fds on 2 clusters and no_shim on 1 cluster right now.
I will report tomorrow if I can see any difference.
Thanks for the upstream bug reference, this is very easily reproducible (start a pod with /bin/false as the command under k8s, every CrashLoop leaks an inotify instance and a goroutine blocked in inotify_read). I'm testing a fix and will submit an upstream bugfix once I validated it.
@jepio Any update on the timeline for a bugfix?
The upstream PR's have been submitted, I'm waiting for reviews, then merge, release and then we'll pick it into Flatcar. Don't know how long that might take overall.
Thanks, can you link to the upstream PRs?
https://github.com/containerd/cgroups/pull/212 is the initial one, after this the changes will need to be vendored into containerd/containerd (second PR).
The inotify leak fix has been merged and is part of containerd 1.6.0. This will be a part of the next alpha release (https://github.com/flatcar-linux/coreos-overlay/pull/1650).
We're still experiencing this using containerd 1.6.6
containerd --version
containerd github.com/containerd/containerd 1.6.6 d0d56c1a4ace8bae8c7c98d28ba98f0537ebe704
Client:
Context: default
Debug Mode: false
Server:
Containers: 469
Running: 74
Paused: 0
Stopped: 395
Images: 36
Server Version: 20.10.14
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d0d56c1a4ace8bae8c7c98d28ba98f0537ebe704
runc version: 886750b989c082700828ec1d3bbb1b397219bfac
init version:
Security Options:
seccomp
Profile: default
selinux
cgroupns
Kernel Version: 5.15.63-flatcar
Operating System: Flatcar Container Linux by Kinvolk 3227.2.2 (Oklo)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.8GiB
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
@kmmanto can you provide more details to back that up? One thing to note is that with cgroupsv2 you will require at least 1 inotify instance per container, and 2+ in the case of a kubernetes pod. So together with systemd internal inotify usage, the default fs.inotify.max_user_instances limit of 128 may need to be increased.
@jepio This is one of the logs of a pod running in a Flatcar node in Openstack. Doing a kubect logs -f <pod_name> prints this and then exits.
I, [2022-09-30T11:16:52.004621 #1] INFO -- : Finished 'health_check.alive'
I, [2022-09-30T11:17:52.002410 #1] INFO -- : Triggering 'health_check.alive'
I, [2022-09-30T11:17:52.002940 #1] INFO -- : Finished 'health_check.alive' duration_ms=0 error=nil
I, [2022-09-30T11:17:52.003031 #1] INFO -- : Finished 'health_check.alive'
failed to create fsnotify watcher: too many open files
Increased fs.inotify.max_user_instances to 8192 as suggested by OP. Will monitor if the issues comes back.
When you hit this, try running this command and paste the output here: sudo find /proc/*/fd -lname anon_inode:inotify | cut -d/ -f3 | xargs -I '{}' -- ps --no-headers -o '%p %U %c %a %P' -p '{}' | uniq -c | sort -nr
As the main issue seems to be fixed since containerd 1.6.0 and current version of containerd on stable is 1.6.16 I'm going ahead and closing this issue.
Do not hesitate to reopen this issue or to create a new one if you have issue with containerd.