prestart_fail.t leaves a stale youki process running
Although all the runtime-tools testcases run by the integration_tests.sh script pass, one of the - prestart_fail.t - leaves a stale youki process running that must be manually removed with kill -9. I'm assuming this is some sort of failure in the cleanup / exit path.
$ ps afx |grep youki
199557 pts/6 S+ 0:00 | \_ grep youki
$ ./integration_test.sh prestart_fail
Running prestart_fail/prestart_fail.t
$ ps afx |grep youki
199599 pts/6 S+ 0:00 | \_ grep youki
199584 pts/6 S 0:00 /home/dwg/src/youki/youki create --bundle /tmp/ocitest617022754 8d19e193-20f6-49f4-87aa-f6e73afe57b6
$ sudo kill -15 199584
$ ps afx |grep youki
199638 pts/6 S+ 0:00 | \_ grep youki
199584 pts/6 S 0:00 /home/dwg/src/youki/youki create --bundle /tmp/ocitest617022754 8d19e193-20f6-49f4-87aa-f6e73afe57b6
$ sudo kill -9 199584
$ ps afx |grep youki
199653 pts/6 S+ 0:00 | \_ grep youki
The logfile contains the following:
$ cat ./integration_test/src/github.com/opencontainers/runtime-tools/log/prestart_fail/prestart_fail.t.log
TAP version 13
failed to start the container
[DEBUG crates/libcontainer/src/hooks.rs:38] 2021-11-24T14:37:06.147184946+11:00 run_hooks arg0: "false", args: []
[DEBUG crates/libcontainer/src/hooks.rs:49] 2021-11-24T14:37:06.147249845+11:00 run_hooks envs: {}
Error: failed to start container 8d19e193-20f6-49f4-87aa-f6e73afe57b6
Caused by:
0: failed to run pre start hooks
1: Failed to execute hook command. Non-zero return code. 1
---
{
"error": "if any prestart hook fails, the runtime MUST generate an error, stop the container, and continue the lifecycle at step 9\nRefer to: https://github.com/opencontainers/runtime-spec/blob/v1.0.2-dev/runtime.md#lifecycle"
}
...
1..0
@dgibson Thanks for your report. Could I ask you to put the result of ./youki info?
Sure
$ ./youki info
Version 0.0.1
Kernel-Release 5.14.18-300.fc35.x86_64
Kernel-Version #1 SMP Fri Nov 12 16:43:17 UTC 2021
Architecture x86_64
Operating System Fedora Linux 35 (Thirty Five)
Cores 8
Total Memory 31876
Cgroup setup hybrid
Cgroup mounts
blkio /sys/fs/cgroup/blkio
cpu /sys/fs/cgroup/cpu,cpuacct
cpuacct /sys/fs/cgroup/cpu,cpuacct
cpuset /sys/fs/cgroup/cpuset
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
hugetlb /sys/fs/cgroup/hugetlb
memory /sys/fs/cgroup/memory
net_cls /sys/fs/cgroup/net_cls,net_prio
net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
pids /sys/fs/cgroup/pids
unified /sys/fs/cgroup/unified
CGroup v2 controllers
cpu detached
cpuset detached
hugetlb detached
io detached
memory detached
pids detached
device attached
Namespaces enabled
mount enabled
uts enabled
ipc enabled
user enabled
pid enabled
network enabled
cgroup enabled
Let me help and take a look
Based on the runtime spec, when the prestart hook fails, the container should be stopped and kills the container process. The container should end in the state of STOPPED.
Reference:
The prestart hooks MUST be invoked by the runtime. If any prestart hook fails, the runtime MUST generate an error, stop the container, and continue the lifecycle at step 12.