Lucas Manning

Results 11 comments of Lucas Manning

Yes, should be fixed in release [20220905.0](https://github.com/google/gvisor/tree/release-20220905.0) by 23b21af6d631c6574901628ea12ec1e7f7e2324d. [unknowndevQwQ](https://github.com/unknowndevQwQ) please let us know if the problem persists after upgrading to that release.

Do you have a new stack trace similar to your original post? There's definitely still an issue but I think it's different from the original problem.

Hi, it seems this could be an issue with the runsc systemd-cgroup driver. Are you passing any custom cgroup settings like memory limit to the container?

Hi @jseba, are you able to get any runsc logs from the failing containers? That could help in diagnosing the exact issue. Also, does the issue happen more frequently with...

Thanks for the report. Would it be possible to get equivalent pcaps for runc/runsc on the A100 where you don't see the issue?

Also just to be sure could you confirm if the A100/H100 are running in the same region?

Thanks for extra logs, we're still investigating on our end. Could you send what you get from running `ip link show` on the H100 and A100? Believe it or not...

To those interested, @avagin found the cause of this bug. It's a small issue with the GVE network driver that's used on some GCP hardware. The driver code can be...

From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format `slice:cri-containerd:uid`) but not specifying the `systemd-cgroup=true` in `runsc_config`. Can you try adding that flag...

@EtiennePerot Maybe, but I think we should always try to stay in line with what runc does. Runc doesn't attempt to auto-detect systemd based configuration, it just reads whatever the...