podman podman top: read /proc/226420/cmdline: no such process

Weird one-off seen in debian rootless:

$ podman [options] container run --name test -d quay.io/libpod/alpine:latest top -d 2
5399b1330f7b2db082d006ee6eb2f1f109f3350bacc8b3e369ba90df8da4476d
$ podman [options] container top test
USER        PID         PPID        %CPU        ELAPSED       TTY         TIME        COMMAND
root        1           0           0.000       1.076338999s  ?           0s          top -d 2 
$ podman [options] container top test groups hgroups
Error: read /proc/226420/cmdline: no such process

I fully expect this one to languish, then get closed in a month or two.

May 06 '24 15:05 edsantiago

Actually I think this is a real error and if one tries it will be not to hard to build a reproducer for as this reminds me off https://github.com/containers/podman/issues/22103#issuecomment-2011848975 which shows that checking only ENOENT is not enough when we parse /proc as other errors can be returned too.

That however does not explain why we have process in such a state here in the test as we only run top and there should be no processes exiting in the meantime in the pid namespace.

May 06 '24 16:05 Luap99

the kernel returns ESRCH if the process terminates between the cmdline file is open and it is read, so it would be good to treat unix.ESRCH as ENOENT all over the psgo codebase, but I don't understand how it could have been terminated between the container test top and the container test top groups hgroups commands, I can't spot anything weird in the logs

May 07 '24 10:05 giuseppe

agreed, me neither my only thought is that top got killed by a signal in the meantime or alternatively that another process entered the cgroup (looks like psgo collects the pids for all processes based on the cgroup) then exited/got killed

If it is the first one the test will still fail as top would return no results

May 07 '24 11:05 Luap99

I've opened a PR for psgo: https://github.com/containers/psgo/pull/155

May 07 '24 12:05 giuseppe

A friendly reminder that this issue had no activity for 30 days.

Jun 07 '24 00:06 github-actions[bot]

No new instances of this flake, just the one I opened with. I'm okay closing.

Should this recur: first thing to check is grep psgo go.mod:

if you see v1.9.0 (old psgo), ignore the failure.
if you see anything greater than v1.9.0 it means @giuseppe's fix did not address the root cause, and you can opt to reopen this or file a new issue.

Jun 20 '24 20:06 edsantiago

podman podman copied to clipboard

podman top: read /proc/226420/cmdline: no such process

podman
podman copied to clipboard