runc
runc copied to clipboard
Commands run via runc exec do not get killed when remote client disconnects
This has grown to become an issue in our kubernetes clusters: when a user has an exec
shell inside a container and their connection dies, the command being executed inside the container does not die. We noticed this because many teams frequently use the kubernetes debug shell in the browser, and it's typical to just close the tab when complete instead of killing the command first. It also can be an issue for kube exec probes, where kubernetes implements timeouts by simply closing the exec session so exec probe commands can easily add up.
I understand that killing the process upon disconnection may not always be desirable, but it would be great if it were at least an option since it's basically impossible to implement a hack that can achieve this without help from runc: I looked into injecting a wrapper for exec commands that detects when the pty is closed and kills the child process, but I could not find any way to detect that the other end of the pty had hung up.
From the top of my head: when a connection dies, runc receives SIGHUP which it forwards to the container process. It's up to the process to react to it; default handler for SIGHUP is to die.
@kolyshkin Any chance you could point me to the code where it's handling that? I had looked before filing this issue, but couldn't find anything. Thanks!
@kolyshkin please correct me If this is misleading but @PaulFurtado this might be related to what you're looking for:
https://github.com/opencontainers/runc/blob/eddf35e5462e2a9f24d8279874a84cfc8b8453c2/signals.go#L77-L106
Okay so to confirm, that code is forwarding the signals that the runc exec
command receives to the child process, right? In that case, the issue may be that docker/containerd need to send a signal when the remote connection fails?
Here's how to reproduce with docker:
- Run a container
$ docker run -d busybox sleep infinity
e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116
- Exec another sleep infinity process inside it:
$ docker exec -it e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116 sleep infinity
- Ungracefully kill the
docker exec
command that's in the foreground:
$ ps aux | grep 'docker exec'
pfurtado 589002 0.1 0.1 1494968 48164 pts/87 Sl+ 18:36 0:00 docker exec -it e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116 sleep infinity
$ kill -9 589002
- Observe that the second sleep infinity process still remains running in the container:
$ ps aux | grep 'sleep infinity'
root 588954 0.0 0.0 1316 4 ? Ss 18:35 0:00 sleep infinity
root 589022 0.0 0.0 1316 4 pts/0 Ss+ 18:36 0:00 sleep infinity
I initially thought runc was to blame here because I could reproduce with runc exec
directly. Ex:
sudo runc --root=/run/docker/runtime-runc/moby exec e5f680b2c6a6faaf3bb7b9bab022cb15413ac40070356ee04fca40cb3153c116 sleep infinity
killing the runc exec command leaves the sleep process running inside the container.
But I'm realizing now that runc exec
is actually directly invoking the process? I had thought that runc exec
was communicating with the runc
process that owns the container over a unix socket. Since that is not the case, it definitely makes sense that issuing kill -9
to runc exec
wouldn't result in the child being signaled and runc cannot do anything about that.
So in the docker exec
case, is it actually containerd here that should be calling something in librunc to signal the process?
The command that you invoke either as part of the Dockerfile or from the command line when you start a container becomes PID 1 so in your case, it's the sleep
process. The init process is supposed to handle signals, reap zombie processes and other responsibilities that come with being an init process within the container namespace. runc
or any other container runtime, to the best of my knowledge just makes the init process the PID 1 and don't interfere further in that regard.
This is actually quite a common use-case and I suggest you look into an init-system like dumb-init or tini to run as init processes which in turn should handle your (other) child processes.
@danishprakash We actually do run all containers with tini
injected by dockerd. But this issue isn't about the main process of the container, it's about ones executed via docker exec
inside an existing container. The processes being run via docker exec
aren't zombies waiting to be reaped, they continue actively running when the client loses its connection to dockerd.
killing the runc exec command leaves the sleep process running inside the container.
If you kill a process with -9
, you can't expect it to do anything, since this signal can not be handled by the process itself.
OTOH if you use SIGETRM
, it works as intended:
# ./runc --debug exec xx45 sleep 99m
...
DEBU[0000] child process in init()
DEBU[0000] setns_init: about to exec
(in a second terminal)
$ ps ax | grep 'sleep 99m'
811482 pts/1 Sl+ 0:00 ./runc --debug exec xx45 sleep 99m
811495 ? Ss 0:00 sleep 99m
811720 pts/2 S+ 0:00 grep --color=auto sleep 99m
$ sudo kill -TERM 811482
$ ps ax | grep 'sleep 99m'
812028 pts/2 S+ 0:00 grep --color=auto sleep 99m
(back to the first terminal)
DEBU[0130]signals.go:102 main.(*signalHandler).forward() forwarding signal "SIGTERM"
DEBU[0130]signals.go:92 main.(*signalHandler).forward() process exited pid=811495 status=143
[root@kir-rhat runc-tst]#
Now, there are two ways to execute runc exec
.
(1) Without -d
(like I did above, and you did in your example earlier). In this case runc waits for the process to exit, and forwards the signals to it. In this case, it is enough to send SIGTERM
to the runc process (note though that the process that gets executed can choose to ignore SIGTERM
).
(2) With -d
. In this case runc will spawn a process and exit, so there's no longer a runc process we can rely on forwarding a signal. If you take a look at runc exec --help
, you can see there is --pid-file
option, which writes the host PID of the process being executed to a file. You can use it, something like
# ps ax | grep 'sleep 99m'
812885 ? Ss 0:00 sleep 99m
812921 pts/1 S+ 0:00 grep --color=auto sleep 99m
# cat exec.pid
812885[root@kir-rhat runc-tst]# kill -TERM `cat exec.pid`
# ps ax | grep 'sleep 99m'
812967 pts/1 S+ 0:00 grep --color=auto sleep 99m
All of the above is applicable to cases without a terminal for the process being exec'ed. If there is a terminal (like in case of runc exec -t
, then what I have described above in https://github.com/opencontainers/runc/issues/3359#issuecomment-1024631174 is taking care of the problem. In the following example I'm using ssh localhost
to reproduce a situation from this issue title, i.e. "remote client disconnects":
$ ssh localhost
Last login: Mon Jan 31 11:13:28 2022 from ::1
$ sudo runc exec -t xx67 sleep 99m
(here I use ssh escape sequence (~ . Enter) to disconnect)
Connection to localhost closed.
[kir@kir-rhat runc-tst]$ ps ax | grep 'sleep 99m'
814615 pts/1 S+ 0:00 grep --color=auto sleep 99m
With all that, I don't see there is an issue with runc.
@kolyshkin yeah, at the end of my big previous comment, I was acknowledging that I now realized that runc exec
directly spawns the process and that I now don't expect it to handle kill -9
given that. Prior to that comment, I had thought that runc exec
was actually talking to the main runc process for the container over a unix socket and was expecting that a client disconnecting from that unix socket would result in the process being signaled.
Since that is not the case, this issue is somewhere in containerd or dockerd.
This is not a problem of runc
in https://github.com/opencontainers/runc/issues/3359#issuecomment-1026071698
The processes being run via docker exec aren't zombies waiting to be reaped, they continue actively running when the client loses its connection to dockerd.
Your problem should be that the client does not handle the process in the container when it exits unexpectedly.
Take kubectl as an example, it uses remotecommand.NewSPDYExecutor
for exec communication,so if you want to handle processes in containers when kubectl exits unexpectedly,you should add a communication link check like ping-pong in apiserver or kubelet and send \u0004
to process in container if your client exits , but currently k8s does not seem to implement such a thing