Samuel Karp
Samuel Karp
Some thoughts here: * containerd shouldn't hang just because a specified volume is backed by a non-responsive remote filesystem; we should compensate for that somehow. * Go does not seem...
Consider this: ```go errs := make(chan error) ctx, cancel := context.WithTimeout(context.TODO(), 2*time.Minute) defer cancel() go func() { _, err := os.Stat("/path/to/whatever") errs
I think the goroutine would go into its `default` case since `
I don't have an environment to verify this either; this was debugged based on logs from a customer's node. My thought on how to test this was to have a...
I don't think statx with AT_STATX_DONT_SYNC is really sufficient. It might decrease the likelihood of this happening, but I'd rather make sure that containerd is resilient to the stuck os.Stat.
/test all
``` Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-998 Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: CPU: 0 PID: 52432 Comm: runc:[2:INIT] Tainted: G O 6.6.44+ #1 Oct 08...
Yes, I think so. Pushed a new change to modify `CONTAINERD_SYSTEMD_CGROUP=true` and to validate the cgroup mount.
Opened https://github.com/opencontainers/runc/issues/4427 just to start getting a bit more attention
Confirmed cgroup v2, but `CONTAINERD_SYSTEMD_CGROUP=true` does not seem to be picked up. ``` cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot) ```