conmon
conmon copied to clipboard
gVisor stderr "broken pipe" EPIPE upon container closure
hello, there's this issue: https://github.com/google/gvisor/issues/2233
The issue is around cri-o and gVisor(runsc) containers using conmon, in the loggs attached to the mentioned issue there seems to be some process closing the stderr pipe before the container kernel receives the SIGKILL which results in a unclean exit of the container. cri-o has other runtimes (runc and kata-containers) registered which work fine for this case.
I opened this issue here for conmon devs to consider if this could be overcome from within conmon using some existing command-line arguments or maybe consider making one.
greetings.
as a point of clarification: conmon is responsible for initializing the container and keeping track of it (the process that is the parent of the container process). but the container manager (CRI-O, in your case) is responsible for killing it. This makes it unclear whether it's conmon's fault for poorly handling a killed container process, or CRI-O's fault for poorly killing the container
I am suspicious of this commit (as it has to do with SIGPIPE) and am wondering if you could try conmon 2.0.12 to see if you get the same results. I'm also trying today to reproduce
as a point of clarification: conmon is responsible for initializing the container and keeping track of it (the process that is the parent of the container process). but the container manager (CRI-O, in your case) is responsible for killing it. This makes it unclear whether it's conmon's fault for poorly handling a killed container process, or CRI-O's fault for poorly killing the container
this topic is very interesting and it will be very helpful to have a detailed documentation around this pipeline, since cri-o acts as runtime broker and delegates everything to runtime binaries with arguments. This specific use case makes 2 different calls, runsc and conmon. In this scenario it seems like it's conmon who takes the job of providing the pipes
https://github.com/containers/conmon/blob/89b2478b507c6f285cd97ae8e55c85b9cafe6e81/src/utils.h#L19
I am suspicious of this commit (as it has to do with SIGPIPE) and am wondering if you could try conmon 2.0.12 to see if you get the same results. I'm also trying today to reproduce
EDIT: sorry here it is. using this 2.0.12 error persists:
I0327 17:23:20.679032 121968 strace.go:622] [ 2] creds-init X write(0x2 socket:[2], 0xc000100800 "{\"level\":\"info\",\"ts\":1585329800.6788728,\"caller\":\"creds-init/main.go:44\",\"msg\":\"Credentials initialized.\"}\n", 0x6b) = 0x0 errno=32 (broken pipe) (8.28µs)
I am suspicious of this commit (as it has to do with SIGPIPE) and am wondering if you could try conmon 2.0.12 to see if you get the same results. I'm also trying today to reproduce
using 2.0.12 there's no broken pipe error, but neither a graceful stop of the container within the pod, so the container is a initcontainer, it needs to gracefully stop for the next one to start.
ah hah, that's probably because conmon is killed from the SIGPIPE, so it never reports the exit code for the container.
@giuseppe advised to ignore SIGPIPE because it's his belief that it's the runtime's duty to handle it, and https://github.com/containers/conmon/issues/134 indicates gvisor handles SIGPIPEs in an unexpected way.
do you have any thoughts on this @giuseppe ?
my diagnosis is CRI-O closes its end of the pipe, which would give SIGPIPE to conmon, but conmon is ignoring it. Then, when runsc writes, it gets the SIGPIPE, and errors in the way above. runc and kata seem to handle this case.