csi-s3
csi-s3 copied to clipboard
defunct processes, the possible explanation
introduction
The following explanation is focused on using csi-s3 with goofys
as a backend. All the components are in their latest version.
The issue I stumbled upon is the number of goofys Zombie processes.
The number doesn't have any importance in the understanding.
explanation
I looked in the csi-s3 code and more importantly at the FuseUnmount
function and then at waitForProcess
https://github.com/ctrox/csi-s3/blob/ddbd6fdaa1bd76754df3ea76fba0afe149ebe87d/pkg/mounter/mounter.go#L133-L156
Due to the name of the function I was expected to see a wait4
syscall to consume the child process, in our case goofys
.
If we look at the below outputs:
- we have a
goofys
Zombie process withpid=32767
$ ps aux | grep goofys
root 32767 0.0 0.0 0 0 ? Zs Jun14 0:00 [goofys] <defunct>
- its parent process the s3driver
$ pstree -s 32767
systemd───containerd-shim───s3driver───goofys
As s3driver launches goofys
backend (I guess it is the case for the other backends 🤷🏼♂️), s3driver is the parent process. Then as a good parent 😃 it should wait4 its child to know what was its status.
In other words, there is a leak on child termination. The fix should be trivial; in the waitForProcess
when the cmdLine
is empty, we have to syscall.wait4
on the given pid.
https://github.com/ctrox/csi-s3/blob/ddbd6fdaa1bd76754df3ea76fba0afe149ebe87d/pkg/mounter/mounter.go#L142-L148
wdyt @ctrox?