seccompagent icon indicating copy to clipboard operation
seccompagent copied to clipboard

The NotifReceive function is blocked and the notifHandler goroutine cannot exit.

Open neblen opened this issue 3 years ago • 4 comments

Description

[ 1 paragraph concisely describing the bug ] The NotifReceive function is blocked and the notifHandler goroutine cannot exit. When the container generates a new process, seccomp agent will allocate a notifHandler goroutine to monitor the abnormal syscall. When the process was dead, the notifHandler goroutine was still there and blocked at the NotifReceive function.

Impact

A large number of useless notifHandler goroutines are generated in the seccomp agent container [ 1 sentence detailing the impact this bug is creating for you ]

Environment and steps to reproduce

k8s version v1.21.4+rke2r2 linux system Ubuntu 20.04.1 LTS kernel version 5.15.0-50-generic

  1. Set-up: [ describe the environment Flatcar/Lokomotive/Nebraska etc was running in when encountering the bug; Platform etc. ]
  2. Task: [ describe the task performing when encountering the bug ]
  3. Action(s): [ sequence of actions that triggered the bug, see example below ] a. [ requested the start of a new pod or container ] b. [ container image downloaded ]
  4. Error: [describe the error that was triggered]

Expected behavior

[ describe what you expected to happen at 4. above but instead got an error ]

Additional information

Please add any information here that does not fit the above format.

neblen avatar Jan 11 '23 02:01 neblen

x-ref https://github.com/seccomp/libseccomp-golang/issues/104

alban avatar Jan 11 '23 10:01 alban

Thanks for the report!

Given that libseccomp-golang bug and that the proper fix might be in the kernel, there might be some wordarounds we can do until that happens, is merged and backported. Like a switch statement with a case that executes this blocking call and a default case to sleep and check somehow if the process is still running, for example (exit if the process is not running anymore, loop and try to receive a notification otherwise). Maybe something like this, or some other workaround, can be used in the seccomp agent meanwhile.

To know if a process is still valid and don't suffer from pid recycle, we could use the pidfd of the process. But not sure we can get that without any race, so not sure we can use that...

And there doesn't seem to be any way to check if the seccomp fd is still valid either, so... yeah, maybe we can't work around this? It seems weird, I guess LXC/LXD handles this in some way, so maybe we can have a look at what they do to see if there is any way to detect this?

@neblen do you want to experiment with this and have a look to see if we have any options to workaround this issue?

rata avatar Jan 11 '23 14:01 rata

Hi~ @rata Yes. I can do some experiments to verify whether the notifyHandler gooroutine can be terminated. I have an idea: After the process monitored by the notifyHandler coroutine exited, the notifyHandler goroutine would block in the NotifReceive function. At this time, using another goroutine to write a message to the SeccompFd of the notifyHandler gooroutine to activate the NotifReceive function. At this time, the notifyHandler goroutine can exit by itself.

But now I am not sure how to successfully write data to SeccompFd by goroutines.

neblen avatar Jan 11 '23 16:01 neblen

Talking with alban, he remembered you get a POLLHUP event on the fd: https://github.com/torvalds/linux/commit/99cdb8b9a57393b5978e7a6310a2cba511dd179b

Userspace is currently not polling on this, IIRC, but that can be a solution for the mean time. With that option, though, not sure if it is worth writing a patch to improve for users only calling blocking functions.

rata avatar Jan 11 '23 16:01 rata