runc Commands run via runc exec do not get killed when remote client disconnects

This has grown to become an issue in our kubernetes clusters: when a user has an exec shell inside a container and their connection dies, the command being executed inside the container does not die. We noticed this because many teams frequently use the kubernetes debug shell in the browser, and it's typical to just close the tab when complete instead of killing the command first. It also can be an issue for kube exec probes, where kubernetes implements timeouts by simply closing the exec session so exec probe commands can easily add up.

I understand that killing the process upon disconnection may not always be desirable, but it would be great if it were at least an option since it's basically impossible to implement a hack that can achieve this without help from runc: I looked into injecting a wrapper for exec commands that detects when the pty is closed and kills the child process, but I could not find any way to detect that the other end of the pty had hung up.

Jan 28 '22 19:01 PaulFurtado

From the top of my head: when a connection dies, runc receives SIGHUP which it forwards to the container process. It's up to the process to react to it; default handler for SIGHUP is to die.

Jan 28 '22 20:01 kolyshkin

@kolyshkin Any chance you could point me to the code where it's handling that? I had looked before filing this issue, but couldn't find anything. Thanks!

Jan 28 '22 23:01 PaulFurtado

@kolyshkin please correct me If this is misleading but @PaulFurtado this might be related to what you're looking for:

https://github.com/opencontainers/runc/blob/eddf35e5462e2a9f24d8279874a84cfc8b8453c2/signals.go#L77-L106

Jan 29 '22 08:01 danishprakash

Okay so to confirm, that code is forwarding the signals that the runc exec command receives to the child process, right? In that case, the issue may be that docker/containerd need to send a signal when the remote connection fails?

Here's how to reproduce with docker:

Run a container

$ docker run -d busybox sleep infinity
e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116

Exec another sleep infinity process inside it:

$ docker exec -it e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116 sleep infinity

Ungracefully kill the docker exec command that's in the foreground:

$ ps aux | grep 'docker exec'
pfurtado  589002  0.1  0.1 1494968 48164 pts/87  Sl+  18:36   0:00 docker exec -it e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116 sleep infinity
$ kill -9 589002

Observe that the second sleep infinity process still remains running in the container:

$ ps aux | grep 'sleep infinity'
root      588954  0.0  0.0   1316     4 ?        Ss   18:35   0:00 sleep infinity
root      589022  0.0  0.0   1316     4 pts/0    Ss+  18:36   0:00 sleep infinity

I initially thought runc was to blame here because I could reproduce with runc exec directly. Ex:

sudo runc --root=/run/docker/runtime-runc/moby exec e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116 sleep infinity

killing the runc exec command leaves the sleep process running inside the container.

But I'm realizing now that runc exec is actually directly invoking the process? I had thought that runc exec was communicating with the runc process that owns the container over a unix socket. Since that is not the case, it definitely makes sense that issuing kill -9 to runc exec wouldn't result in the child being signaled and runc cannot do anything about that.

So in the docker exec case, is it actually containerd here that should be calling something in librunc to signal the process?

Jan 29 '22 18:01 PaulFurtado

The command that you invoke either as part of the Dockerfile or from the command line when you start a container becomes PID 1 so in your case, it's the sleep process. The init process is supposed to handle signals, reap zombie processes and other responsibilities that come with being an init process within the container namespace. runc or any other container runtime, to the best of my knowledge just makes the init process the PID 1 and don't interfere further in that regard.

This is actually quite a common use-case and I suggest you look into an init-system like dumb-init or tini to run as init processes which in turn should handle your (other) child processes.

Jan 31 '22 02:01 danishprakash

@danishprakash We actually do run all containers with tini injected by dockerd. But this issue isn't about the main process of the container, it's about ones executed via docker exec inside an existing container. The processes being run via docker exec aren't zombies waiting to be reaped, they continue actively running when the client loses its connection to dockerd.

Jan 31 '22 18:01 PaulFurtado

killing the runc exec command leaves the sleep process running inside the container.

If you kill a process with -9, you can't expect it to do anything, since this signal can not be handled by the process itself.

OTOH if you use SIGETRM, it works as intended:

# ./runc --debug exec xx45 sleep 99m
...
DEBU[0000] child process in init()                      
DEBU[0000] setns_init: about to exec                    

(in a second terminal)
$ ps ax | grep 'sleep 99m'
 811482 pts/1    Sl+    0:00 ./runc --debug exec xx45 sleep 99m
 811495 ?        Ss     0:00 sleep 99m
 811720 pts/2    S+     0:00 grep --color=auto sleep 99m
$ sudo kill -TERM 811482
$ ps ax | grep 'sleep 99m'
 812028 pts/2    S+     0:00 grep --color=auto sleep 99m

(back to the first terminal)
DEBU[0130]signals.go:102 main.(*signalHandler).forward() forwarding signal "SIGTERM"                  
DEBU[0130]signals.go:92 main.(*signalHandler).forward() process exited                                pid=811495 status=143
[root@kir-rhat runc-tst]#

Now, there are two ways to execute runc exec.

(1) Without -d (like I did above, and you did in your example earlier). In this case runc waits for the process to exit, and forwards the signals to it. In this case, it is enough to send SIGTERM to the runc process (note though that the process that gets executed can choose to ignore SIGTERM).

(2) With -d. In this case runc will spawn a process and exit, so there's no longer a runc process we can rely on forwarding a signal. If you take a look at runc exec --help, you can see there is --pid-file option, which writes the host PID of the process being executed to a file. You can use it, something like

# ps ax | grep 'sleep 99m'
 812885 ?        Ss     0:00 sleep 99m
 812921 pts/1    S+     0:00 grep --color=auto sleep 99m
# cat exec.pid 
812885[root@kir-rhat runc-tst]# kill -TERM `cat exec.pid`
# ps ax | grep 'sleep 99m'
 812967 pts/1    S+     0:00 grep --color=auto sleep 99m

All of the above is applicable to cases without a terminal for the process being exec'ed. If there is a terminal (like in case of runc exec -t, then what I have described above in https://github.com/opencontainers/runc/issues/3359#issuecomment-1024631174 is taking care of the problem. In the following example I'm using ssh localhost to reproduce a situation from this issue title, i.e. "remote client disconnects":

$ ssh localhost
Last login: Mon Jan 31 11:13:28 2022 from ::1
$ sudo runc exec -t xx67 sleep 99m
(here I use ssh escape sequence (~ . Enter) to disconnect)
Connection to localhost closed.
[kir@kir-rhat runc-tst]$ ps ax | grep 'sleep 99m'
 814615 pts/1    S+     0:00 grep --color=auto sleep 99m

With all that, I don't see there is an issue with runc.

Jan 31 '22 19:01 kolyshkin

@kolyshkin yeah, at the end of my big previous comment, I was acknowledging that I now realized that runc exec directly spawns the process and that I now don't expect it to handle kill -9 given that. Prior to that comment, I had thought that runc exec was actually talking to the main runc process for the container over a unix socket and was expecting that a client disconnecting from that unix socket would result in the process being signaled.

Since that is not the case, this issue is somewhere in containerd or dockerd.

Jan 31 '22 20:01 PaulFurtado

This is not a problem of runc

in https://github.com/opencontainers/runc/issues/3359#issuecomment-1026071698

The processes being run via docker exec aren't zombies waiting to be reaped, they continue actively running when the client loses its connection to dockerd.

Your problem should be that the client does not handle the process in the container when it exits unexpectedly.

Take kubectl as an example, it uses remotecommand.NewSPDYExecutor for exec communication，so if you want to handle processes in containers when kubectl exits unexpectedly，you should add a communication link check like ping-pong in apiserver or kubelet and send \u0004 to process in container if your client exits , but currently k8s does not seem to implement such a thing

Feb 18 '22 09:02 g0dA

runc runc copied to clipboard

Commands run via runc exec do not get killed when remote client disconnects

runc
runc copied to clipboard