podman
podman copied to clipboard
podman top: read /proc/226420/cmdline: no such process
Weird one-off seen in debian rootless:
$ podman [options] container run --name test -d quay.io/libpod/alpine:latest top -d 2
5399b1330f7b2db082d006ee6eb2f1f109f3350bacc8b3e369ba90df8da4476d
$ podman [options] container top test
USER PID PPID %CPU ELAPSED TTY TIME COMMAND
root 1 0 0.000 1.076338999s ? 0s top -d 2
$ podman [options] container top test groups hgroups
Error: read /proc/226420/cmdline: no such process
I fully expect this one to languish, then get closed in a month or two.
Actually I think this is a real error and if one tries it will be not to hard to build a reproducer for as this reminds me off https://github.com/containers/podman/issues/22103#issuecomment-2011848975 which shows that checking only ENOENT is not enough when we parse /proc as other errors can be returned too.
That however does not explain why we have process in such a state here in the test as we only run top and there should be no processes exiting in the meantime in the pid namespace.
the kernel returns ESRCH if the process terminates between the cmdline file is open and it is read, so it would be good to treat unix.ESRCH as ENOENT all over the psgo codebase, but I don't understand how it could have been terminated between the container test top and the container test top groups hgroups commands, I can't spot anything weird in the logs
agreed, me neither my only thought is that top got killed by a signal in the meantime or alternatively that another process entered the cgroup (looks like psgo collects the pids for all processes based on the cgroup) then exited/got killed
If it is the first one the test will still fail as top would return no results
I've opened a PR for psgo: https://github.com/containers/psgo/pull/155
A friendly reminder that this issue had no activity for 30 days.
No new instances of this flake, just the one I opened with. I'm okay closing.
Should this recur: first thing to check is grep psgo go.mod:
- if you see
v1.9.0(old psgo), ignore the failure. - if you see anything greater than
v1.9.0it means @giuseppe's fix did not address the root cause, and you can opt to reopen this or file a new issue.