conmon icon indicating copy to clipboard operation
conmon copied to clipboard

gVisor stderr "broken pipe" EPIPE upon container closure

Open gattytto opened this issue 4 years ago • 5 comments

hello, there's this issue: https://github.com/google/gvisor/issues/2233

The issue is around cri-o and gVisor(runsc) containers using conmon, in the loggs attached to the mentioned issue there seems to be some process closing the stderr pipe before the container kernel receives the SIGKILL which results in a unclean exit of the container. cri-o has other runtimes (runc and kata-containers) registered which work fine for this case.

I opened this issue here for conmon devs to consider if this could be overcome from within conmon using some existing command-line arguments or maybe consider making one.

greetings.

gattytto avatar Mar 27 '20 05:03 gattytto

as a point of clarification: conmon is responsible for initializing the container and keeping track of it (the process that is the parent of the container process). but the container manager (CRI-O, in your case) is responsible for killing it. This makes it unclear whether it's conmon's fault for poorly handling a killed container process, or CRI-O's fault for poorly killing the container

I am suspicious of this commit (as it has to do with SIGPIPE) and am wondering if you could try conmon 2.0.12 to see if you get the same results. I'm also trying today to reproduce

haircommander avatar Mar 27 '20 14:03 haircommander

as a point of clarification: conmon is responsible for initializing the container and keeping track of it (the process that is the parent of the container process). but the container manager (CRI-O, in your case) is responsible for killing it. This makes it unclear whether it's conmon's fault for poorly handling a killed container process, or CRI-O's fault for poorly killing the container

this topic is very interesting and it will be very helpful to have a detailed documentation around this pipeline, since cri-o acts as runtime broker and delegates everything to runtime binaries with arguments. This specific use case makes 2 different calls, runsc and conmon. In this scenario it seems like it's conmon who takes the job of providing the pipes

https://github.com/containers/conmon/blob/89b2478b507c6f285cd97ae8e55c85b9cafe6e81/src/utils.h#L19

gattytto avatar Mar 27 '20 16:03 gattytto

I am suspicious of this commit (as it has to do with SIGPIPE) and am wondering if you could try conmon 2.0.12 to see if you get the same results. I'm also trying today to reproduce

EDIT: sorry here it is. using this 2.0.12 error persists:

I0327 17:23:20.679032  121968 strace.go:622] [   2] creds-init X write(0x2 socket:[2], 0xc000100800 "{\"level\":\"info\",\"ts\":1585329800.6788728,\"caller\":\"creds-init/main.go:44\",\"msg\":\"Credentials initialized.\"}\n", 0x6b) = 0x0 errno=32 (broken pipe) (8.28µs)

gattytto avatar Mar 27 '20 17:03 gattytto

I am suspicious of this commit (as it has to do with SIGPIPE) and am wondering if you could try conmon 2.0.12 to see if you get the same results. I'm also trying today to reproduce

using 2.0.12 there's no broken pipe error, but neither a graceful stop of the container within the pod, so the container is a initcontainer, it needs to gracefully stop for the next one to start.

ah hah, that's probably because conmon is killed from the SIGPIPE, so it never reports the exit code for the container.

@giuseppe advised to ignore SIGPIPE because it's his belief that it's the runtime's duty to handle it, and https://github.com/containers/conmon/issues/134 indicates gvisor handles SIGPIPEs in an unexpected way.

do you have any thoughts on this @giuseppe ?

haircommander avatar Mar 27 '20 17:03 haircommander

my diagnosis is CRI-O closes its end of the pipe, which would give SIGPIPE to conmon, but conmon is ignoring it. Then, when runsc writes, it gets the SIGPIPE, and errors in the way above. runc and kata seem to handle this case.

haircommander avatar Mar 27 '20 17:03 haircommander