Flatcar containerd-shim processes are leaking inotify instances with cgroups v2

This is a duplicate of https://github.com/containerd/containerd/issues/5670

But I wanted to raise an issue with Flatcar anyway:

For visibility to other FC users who might run into it
Perhaps other users don't see this issue or have found a workaround

Since 2983.2.0 defaults to cgroupsv2, we saw this issue frequent enough where we had to roll back.

Client application proc might log something like this:

failed to create fsnotify watcher: too many open files

You can lessen the issue by increasing the default: fs.inotify.max_user_instances=8192, but sooner or later nodes still run out..

Nov 29 '21 13:11 george-angel

Hi, thanks for raising this here. One first question I have for a workaround is whether the shim is really required because I read a comment that it was only needed for live restore. Would be good to try setting no_shim = true following https://www.flatcar-linux.org/docs/latest/container-runtimes/customizing-docker/#use-a-custom-containerd-configuration

Nov 29 '21 15:11 pothos

I'm going to try this now, but this is interesting: https://github.com/flatcar-linux/coreos-overlay/blob/main/app-emulation/containerd/files/config.toml#L26-L28

The comment suggests not running with a shim, but the setting is default false value.

Nov 29 '21 15:11 george-angel

no_shim = false means it uses the shim ;) - but yes, the comment above is confusing.

I think you would change it to true under [plugins."containerd.runtime.v1.linux"] but maybe it's worth to check if there are possible sections in your config dump output (I don't have a Phd in containerd config.toml-ology, don't trust what I say).

Nov 29 '21 15:11 pothos

I guess the comment wording (also in https://github.com/containerd/containerd/blob/main/docs/ops.md#linux-runtime-plugin) was done that way to match the config name and what is does when enabled, not the value false that is set…

Nov 29 '21 16:11 pothos

I'm going to try and removing shim and report back :+1:

Nov 29 '21 16:11 george-angel

I'm going to try and removing shim and report back 👍

Any update on this? We are experiencing the same issue and curious to know if removing the shim is a viable option

Dec 01 '21 14:12 jbehling

Sorry - not yet.

I'm deploying a count metric for inotify fds on 2 clusters and no_shim on 1 cluster right now.

I will report tomorrow if I can see any difference.

Dec 01 '21 15:12 george-angel

Thanks for the upstream bug reference, this is very easily reproducible (start a pod with /bin/false as the command under k8s, every CrashLoop leaks an inotify instance and a goroutine blocked in inotify_read). I'm testing a fix and will submit an upstream bugfix once I validated it.

Dec 01 '21 16:12 jepio

@jepio Any update on the timeline for a bugfix?

Dec 03 '21 21:12 jbehling

The upstream PR's have been submitted, I'm waiting for reviews, then merge, release and then we'll pick it into Flatcar. Don't know how long that might take overall.

Dec 06 '21 08:12 jepio

Thanks, can you link to the upstream PRs?

Dec 06 '21 15:12 jbehling

https://github.com/containerd/cgroups/pull/212 is the initial one, after this the changes will need to be vendored into containerd/containerd (second PR).

Dec 08 '21 16:12 jepio

The inotify leak fix has been merged and is part of containerd 1.6.0. This will be a part of the next alpha release (https://github.com/flatcar-linux/coreos-overlay/pull/1650).

Feb 17 '22 13:02 jepio

We're still experiencing this using containerd 1.6.6

containerd --version
containerd github.com/containerd/containerd 1.6.6 d0d56c1a4ace8bae8c7c98d28ba98f0537ebe704

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 469
  Running: 74
  Paused: 0
  Stopped: 395
 Images: 36
 Server Version: 20.10.14
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d0d56c1a4ace8bae8c7c98d28ba98f0537ebe704
 runc version: 886750b989c082700828ec1d3bbb1b397219bfac
 init version: 
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns
 Kernel Version: 5.15.63-flatcar
 Operating System: Flatcar Container Linux by Kinvolk 3227.2.2 (Oklo)
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 125.8GiB
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Sep 23 '22 01:09 kmmanto

@kmmanto can you provide more details to back that up? One thing to note is that with cgroupsv2 you will require at least 1 inotify instance per container, and 2+ in the case of a kubernetes pod. So together with systemd internal inotify usage, the default fs.inotify.max_user_instances limit of 128 may need to be increased.

Sep 23 '22 06:09 jepio

@jepio This is one of the logs of a pod running in a Flatcar node in Openstack. Doing a kubect logs -f <pod_name> prints this and then exits.

I, [2022-09-30T11:16:52.004621 #1]  INFO -- : Finished   'health_check.alive'
I, [2022-09-30T11:17:52.002410 #1]  INFO -- : Triggering 'health_check.alive'
I, [2022-09-30T11:17:52.002940 #1]  INFO -- : Finished 'health_check.alive' duration_ms=0 error=nil
I, [2022-09-30T11:17:52.003031 #1]  INFO -- : Finished   'health_check.alive'
failed to create fsnotify watcher: too many open files

Increased fs.inotify.max_user_instances to 8192 as suggested by OP. Will monitor if the issues comes back.

Sep 30 '22 01:09 kmmanto

When you hit this, try running this command and paste the output here: sudo find /proc/*/fd -lname anon_inode:inotify | cut -d/ -f3 | xargs -I '{}' -- ps --no-headers -o '%p %U %c %a %P' -p '{}' | uniq -c | sort -nr

Sep 30 '22 06:09 jepio

As the main issue seems to be fixed since containerd 1.6.0 and current version of containerd on stable is 1.6.16 I'm going ahead and closing this issue.

Do not hesitate to reopen this issue or to create a new one if you have issue with containerd.

Sep 08 '23 14:09 tormath1