toolbox cmd/run: Ensure underlying container is stopped when toolbox is killed

Right now "toolbox enter" creates a container on the fly, but then lets it linger after the foreground toolbox process is killed (for instance, from a terminal hangup).

Not killing the underlying container has the negative side effect of stalling shutdown if a toolbox shell is running.

This commit addresses that problem by detecting when the toolbox process is signaled, and then in response, kills off the entire cgroup associated with the underlying container.

Closes https://github.com/containers/toolbox/issues/1157

Dec 22 '22 01:12 halfline

@matthiasclasen i think this is probably a fix for the shutdown issue you told me about yesterday

Dec 22 '22 01:12 halfline

Build failed.

:x: unit-test FAILURE in 24m 23s :x: unit-test-migration-path-for-coreos-toolbox FAILURE in 3m 12s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 34m 09s :heavy_check_mark: system-test-fedora-36 SUCCESS in 12m 45s :heavy_check_mark: system-test-fedora-35 SUCCESS in 14m 19s

Dec 22 '22 02:12 softwarefactory-project-zuul[bot]

Build succeeded.

:heavy_check_mark: unit-test SUCCESS in 27m 38s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 09s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 38m 38s :heavy_check_mark: system-test-fedora-36 SUCCESS in 12m 23s :heavy_check_mark: system-test-fedora-35 SUCCESS in 13m 53s

Dec 22 '22 03:12 softwarefactory-project-zuul[bot]

Just for the sake of posterity ...

@matthiasclasen i think this is probably a fix for the shutdown issue you told me about yesterday

I think the blocked shutdown problem was reported as:

https://github.com/containers/podman/issues/14531
https://bugzilla.redhat.com/show_bug.cgi?id=2081664
https://bugzilla.redhat.com/show_bug.cgi?id=2084498

... and might have been fixed by: https://github.com/containers/podman/pull/17025

Note that while https://github.com/containers/podman/issues/14531 points at https://github.com/containers/podman/pull/16785 , I don't think it had anything to do with fixing blocked shutdowns caused by lingering podman exec sessions.

Sep 30 '23 12:09 debarshiray

sorry it's been a while since i looked at this and even when I was working on this I only put it together in a few hours after Matthias visited and he mentioned the problem off hand, so I may be off base here... I haven't done much deep diving and I may have made wrong assumptions.

Having said that, my impression from your message is you believe I'm sending the hangup signal to the terminal's cgroup, but that's not the intent. The idea was to send the signal to the cgroup of the bash running in the container.

I got (at least) two things wrong I think:

I had assumed each conmon would create a new cgroup for each podman-exec invocation. It seems that assumption was perhaps wrong. Indeed, every shell seems to be using a shared cgroup. I get:

╎❯ cat /proc/self/cgroup 0::/user.slice/user-4153.slice/[email protected]/user.slice/libpod-4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528ab.scope/container

in all my terminals. I find this really surprising. I do see crun has a --cgroup option, but indeed looking in ps output, it's not getting used.

I looking at the wrong pid to find the cgroup. I'm erroneously using the pid of the toolbox init-container process to find the cgroup.

This is https://github.com/containers/toolbox/issues/1204 I suspect this happens because the signal sent by the terminal only reaches upto the wrapper podman exec and doesn't get to the corresponding conmon(8) process

Right, when a terminal is closed, a SIGHUP is sent to all processes in the terminal "session" (in the setsid() sense of the word). These sessions are inherited from their parents, unless a child creates a fresh session. There's no way in unix afaik to move a child into a pre-existing session; it either uses the one it started with, or it starts its own. conmon isn't part of the child hierarchy of podman-exec, so it won't inherit the terminal session. This means the hangup signal needs to be ferried either from podman-exec or toolbox to the process started by conmon (or if children had their own cgroup like I thought, we could just send the signal to the whole cgroup)

Oct 01 '23 14:10 halfline

Having said that, my impression from your message is you believe I'm sending the hangup signal to the terminal's cgroup, but that's not the intent. The idea was to send the signal to the cgroup of the bash running in the container.

I managed to confuse myself as well.

GNOME Terminal puts every VteTerminal into a separate systemd scope or cgroup. That's what I was showing here, because these process IDs are of the toolbox enter processes, and not of conmon(8) or the processes inside the container:

$ cat /proc/56151/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-2b4111a9-f4a9-4f2b-8acd-4adc9e673bac.scope
$ 
$ cat /proc/55919/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-9bf0516d-bd6f-4094-9504-0be37b1ce6bf.scope
$ 
$ cat /proc/56363/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-6ecb9e7e-aa06-42d6-b9b6-bb9d6cdb44f4.scope

That's why I didn't realize that all the conmon(8) processes for the same container share the same cgroup. However, they don't share it across different containers.

I had assumed each conmon would create a new cgroup for each podman-exec invocation. It seems that assumption was perhaps wrong. Indeed, every shell seems to be using a shared cgroup. I get:

╎❯ cat /proc/self/cgroup 0::/user.slice/user-4153.slice/[email protected]/user.slice/libpod-4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528ab.scope/container

in all my terminals. I find this really surprising. I do see crun has a --cgroup option, but indeed looking in ps output, it's not getting used.

Right. Thanks for the hint, because I had missed it before.

I looking at the wrong pid to find the cgroup. I'm erroneously using the pid of the toolbox init-container process to find the cgroup.

nod

This is #1204 I suspect this happens because the signal sent by the terminal only reaches upto the wrapper podman exec and doesn't get to the corresponding conmon(8) process

Right, when a terminal is closed, a SIGHUP is sent to all processes in the terminal "session" (in the setsid() sense of the word). These sessions are inherited from their parents, unless a child creates a fresh session. There's no way in unix afaik to move a child into a pre-existing session; it either uses the one it started with, or it starts its own. conmon isn't part of the child hierarchy of podman-exec, so it won't inherit the terminal session. This means the hangup signal needs to be ferried either from podman-exec or toolbox to the process started by conmon (or if children had their own cgroup like I thought, we could just send the signal to the whole cgroup)

nod

I don't see an obvious way to get the PID of the conmon(8) process or the process inside the container started by it. That was my question in https://github.com/containers/toolbox/issues/1204

Otherwise, we can insert a process that we control between conmon(8) and the user-requested process inside the container that will relay the PID inside the container back to the toolbox enter process. Then we can ferry the SIGHUP from the terminal to the container process. I don't know what would be the best way to do it. Maybe we can send a file descriptor?

This intermediate process under our control sounds a lot like the toolbox shell idea that I had before. We don't have to call it toolbox shell, though. It's just that I am really bad at naming. So, suggestions welcome.

Oct 03 '23 16:10 debarshiray

I think it probably is possible to get the right pid. just as an experiment:

╎❯ bundle_path=$(podman inspect 4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528a | jq -r '.[0].StaticDir') ╎❯ exec_id=$(podman exec -d 4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528a sleep 300) ╎❯ pid=$(podman exec 4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528a cat $bundle_path/$exec_id/exec_pid) ╎❯ echo $pid 193097

There may be a more direct command for it. not sure. libpod has an api for it:

// Get PID file path for a container's exec session
func (c *Container) execPidPath(sessionID string) string {
        return filepath.Join(c.execBundlePath(sessionID), "exec_pid")
}

though it looks like toolbx doesn't use libpod ?

Oct 03 '23 17:10 halfline

I guess the complication is the exec_id is only printed for background exec invocations. You can fish it out of podman inspect output, but you would have to diff the list of execids before and after to find the right one, which is a little racy.

Oct 03 '23 17:10 halfline

I think it probably is possible to get the right pid. just as an experiment:

Cool. I didn't know that exec IDs are a thing. I think I never tried podman exec --detach until now.

though it looks like toolbx doesn't use libpod ?

When we rewrote Toolbx in Go, we wanted to use libpod and the other backend Go packages directly instead of going through podman(1). However, the Podman folks discouraged us from doing that, so we didn't.

These days, for other reasons, I am considering implementing some small things directly inside toolbox(1) to avoid going through podman(1), where we can rely on a stable interface. I have no idea if this is such a case.

If we can't get the PID inside the container through a Podman interface, do you think the alternative of inserting a shim is sane?

Oct 03 '23 18:10 debarshiray

podman exec --interactive --tty, which is what toolbox enter does, creates a nested pseudo-terminal device and wires it up to the terminal device that the user is interacting with. If the outer terminal goes away, then shouldn't it break the connection and somehow trigger the inner terminal?

Oct 03 '23 19:10 debarshiray

podman exec --interactive --tty, which is what toolbox enter does, creates a nested pseudo-terminal device and wires it up to the terminal device that the user is interacting with. If the outer terminal goes away, then shouldn't it break the connection and somehow trigger the inner terminal?

Interesting. Right, when the terminal is closed, podman exec would be receiving a hang up signal (SIGHUP) except:

╎❯ echo $$
203855

╎❯ ps -eocomm,pid,ppid,tpgid,sess,pgrp,tty |grep -E 'podman|bash|toolbox|conmon|COMM'
COMMAND             PID    PPID   TPGID    SESS    PGRP TT
bash             203855    3069  207753  203855  203855 pts/18


╎❯ toolbox enter f39
╎❯ ps -eocomm,pid,ppid,tpgid,sess,pgrp,tty |grep -E 'podman|bash|toolbox|conmon|COMMAND|pts.18'
COMMAND             PID    PPID   TPGID    SESS    PGRP TT
bash             203855    3069  207768  203855  203855 ?
toolbox          207768  203855  207768  203855  207768 ?
podman           207888  207768  207768  203855  207768 ?
conmon           207903    1899      -1  207903  207903 ?
bash             207909  207903  207947  207909  207909 pts/3

You can see podman exec is in the toolbox process group and toolbox doesn't have a controlling tty. Neither conmon. In fact, nothing has /dev/pts/18 as the controlling tty anymore. This means when the terminal is closed, the hangup signal doesn't get sent. i guess maybe something became the foreground process group and then exited? or maybe there was a TIOCNOTTY ioctl?

regardless whatever is splicing input between the two ttys (podman? conmon?), should be getting a hang up in poll so it should know to tear things down i guess.

Oct 03 '23 20:10 halfline

just browsing around a little, i think the splicing happens here:

func setupStdioChannels(streams *define.AttachStreams, conn *net.UnixConn, detachKeys []byte) (chan error, chan error) {
        receiveStdoutError := make(chan error)
        go func() {
                receiveStdoutError <- redirectResponseToOutputStreams(streams.OutputStream, streams.ErrorStream, streams.AttachOutput, streams.AttachError, conn)
        }()

        stdinDone := make(chan error)
        go func() {
                var err error
                if streams.AttachInput {
                        _, err = util.CopyDetachable(conn, streams.InputStream, detachKeys)
                }
                stdinDone <- err
        }()

        return receiveStdoutError, stdinDone
}

So maybe something like this would fix it (untested, I don't know go, etc etc) (EDIT: changed idea a bit to not slurp up the stdinDone before it can be processed by readStdio) (EDIT2: changed readStdio to be a method so it has access to the container object, still completely untested)

diff --git a/libpod/oci_conmon_attach_common.go b/libpod/oci_conmon_attach_common.go
index 8d5a7f8..fa1dbad 100644
--- a/libpod/oci_conmon_attach_common.go
+++ b/libpod/oci_conmon_attach_common.go
@@ -101,5 +101,5 @@ func (r *ConmonOCIRuntime) Attach(c *Container, params *AttachOptions) error {
                params.AttachReady <- true
        }
-       return readStdio(conn, params.Streams, receiveStdoutError, stdinDone)
+       return c.readStdio(conn, params.Streams, receiveStdoutError, stdinDone)
 }
 
@@ -193,5 +193,5 @@ func (c *Container) attachToExec(streams *define.AttachStreams, keys *string, se
        }
 
-       return readStdio(conn, streams, receiveStdoutError, stdinDone)
+       return c.readStdio(conn, streams, receiveStdoutError, stdinDone)
 }
 
@@ -288,5 +288,5 @@ func redirectResponseToOutputStreams(outputStream, errorStream io.Writer, writeO
 }
 
-func readStdio(conn *net.UnixConn, streams *define.AttachStreams, receiveStdoutError, stdinDone chan error) error {
+func (c *Container) readStdio(conn *net.UnixConn, streams *define.AttachStreams, receiveStdoutError, stdinDone chan error) error {
        var err error
        select {
@@ -309,4 +309,7 @@ func readStdio(conn *net.UnixConn, streams *define.AttachStreams, receiveStdoutE
                        }
                }
+               if streams.AttachInput {
+                       c.Kill(uint(syscall.SIGHUP))
+               }
                if streams.AttachOutput || streams.AttachError {
                        return <-receiveStdoutError

Oct 05 '23 00:10 halfline