Kubernetes node e2e tests fail while deleting a container
I replaced the runc binary with youki to run kubernetes node e2e tests using youki. The delete container seems be returning invalid data.
I0223 05:59:31.852925 40220 kubelet.go:2138] "SyncLoop (housekeeping) end"
E0223 05:59:31.873594 40220 remote_runtime.go:510] "RemoveContainer from runtime service failed" err=<
rpc error: code = Unknown desc = failed to delete container f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b: `/usr/local/bin/runc --root /run/runc delete --force f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b` failed: [DEBUG crates/youki/src/main.rs:92] 2022-02-23T05:59:31.857998519+00:00 started by user 0 with ArgsOs { inner: ["/usr/local/bin/runc", "--root", "/run/runc", "delete", "--force", "f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b"] }
[DEBUG crates/youki/src/commands/delete.rs:8] 2022-02-23T05:59:31.858176980+00:00 start deleting f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b
Error: could not load state for container f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b
Caused by:
missing field `ociVersion` at line 1 column 14569
(exit status 1)
> containerID="f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b"
E0223 05:59:31.873957 40220 kuberuntime_gc.go:146] "Failed to remove container" err=<
rpc error: code = Unknown desc = failed to delete container f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b: `/usr/local/bin/runc --root /run/runc delete --force f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b` failed: [DEBUG crates/youki/src/main.rs:92] 2022-02-23T05:59:31.857998519+00:00 started by user 0 with ArgsOs { inner: ["/usr/local/bin/runc", "--root", "/run/runc", "delete", "--force", "f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b"] }
[DEBUG crates/youki/src/commands/delete.rs:8] 2022-02-23T05:59:31.858176980+00:00 start deleting f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b
Error: could not load state for container f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b
Caused by:
missing field `ociVersion` at line 1 column 14569
(exit status 1)
> containerID="f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b"
Let me know if you would like to see the complete journal logs.
@harche Thanks for your report. Please tell me how to reproduce using some commands?
After cloning kubernetes, and bringing up crio,
sudo make test-e2e-node RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT="unix:///var/run/crio/crio.sock" FOCUS="\[NodeConformance\]|\[NodeFeature:.+\]" SKIP="\[Flaky\]|\[Slow\]|\[Serial\]" TEST_ARGS='--kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'
This works on Fedora CoreOS. But you can also run these tests with your choice of CRI implemention.
We run these tests in upstream k8s CI (with runc and crio) - https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance
You can click on individual green box to get test report and click Raw Build-log.txt to see how the test job gets initialized.
Another pointer - https://github.com/kubernetes/kubernetes/blob/master/hack/e2e-node-test.sh
But eventually it boils down to this command,
/usr/local/bin/runc --root /run/runc delete --force f6b72c56564b7cc16dfc7492cda08f1624035114212ab27b28668be0b052ea4b
So you may not actually have to deal with k8s node e2e to reproduce this. @utam0k
Hello, I'm uping this since i end up in the same spot.
Since i saw this #968 got merge, I replaced runc by youki cp youki /usr/sbin/runc on the worker nodes.
But pod are not starting with
Error: failed to create containerd task: failed to create shim: OCI runtime create failed: runc did not terminate successfully: exit status 1: unknown
And if i list the using runc -r /var/run/containerd/runc/k8s.io/ list
i got
[ERROR crates/youki/src/main.rs:138] 2023-02-28T12:21:15.767166815+00:00 error in executing command: missing field `ociVersion` at line 1 column 9558
Error: missing field `ociVersion` at line 1 column 9558
I'm fixing on https://github.com/containers/youki/pull/1884