toolbox Containers remain running after exiting

In particular, this means it's impossible to remove a toolbox-created container without first stopping/killing it with podman:

 ~/toolbox  ./toolbox create -c test
Created container: test
Enter with: toolbox enter --container test
 ~/toolbox  ./toolbox enter -c test 
🔹[exalm@toolbox toolbox]$ logout
 ~/toolbox  ./toolbox rm test
toolbox: failed to remove container test
 ~/toolbox  podman ps
CONTAINER ID  IMAGE                              COMMAND     CREATED         STATUS             PORTS  NAMES
1a08f09ea797  localhost/fedora-toolbox-exalm:30  sleep +Inf  16 seconds ago  Up 10 seconds ago         test
 ~/toolbox  podman stop test
1a08f09ea79710801859bea8dc6a5a85d2031ce1a73dd7d284c3e1fa51a67be0
 ~/toolbox  ./toolbox rm test
 ~/toolbox 

Apr 13 '19 10:04 alice-mkh

toolbox rm --force should also work.

But yes, I'd like to make this properly reference counted, but sadly, I don't know of a way to implement that using the existing Podman command line interface.

Apr 16 '19 13:04 debarshiray

Using toolbox rm --force is not able to delete the container when it is running on my machine.

Perhaps what would could be done is to detect if the container is running when deleting it and then give the user a prompt such as This toolbox is currently running, are you sure you wish to delete it [y/N]: Then call podman stop before calling the delete command if they choose to continue.

May 22 '19 01:05 imciner2

I am unable to delete a container created by toolbox even after using toolbox rm --force and stopping it with podman. I am having to reboot and then toolbox rm works.

Aug 11 '19 23:08 paul8046

I am unable to delete a container created by toolbox even after using toolbox rm --force and stopping it with podman. I am having to reboot and then toolbox rm works.

It will fail if you have currently active toolbox enter sessions. We need to improve the error handling there.

Otherwise, if that's not the case and you can reproduce at will, then I suggest trying podman rm --force <container> to delete the container. If that also fails, then we might have a Podman bug. In any case, let's use a different issue to discuss this.

Thanks for stopping by!

Aug 13 '19 00:08 debarshiray

Considering it was reported almost 1.5 years ago, I'm wondering if there was any progress since then.

Aug 26 '20 14:08 bam80

Seems I can't stop the containers left after Toolbox:

[bam@host ~]$ toolbox list
IMAGE ID      IMAGE NAME                                            CREATED
a198bc8c3cda  registry.fedoraproject.org/f31/fedora-toolbox:31      9 months ago
fe7b8c2393f9  registry.fedoraproject.org/f32/fedora-toolbox:32      4 months ago
3864bc58ab7b  registry.fedoraproject.org/f33/fedora-toolbox:33      4 months ago
b390f0663e2a  registry.fedoraproject.org/f33/fedora-toolbox:latest  2 weeks ago

CONTAINER ID  CONTAINER NAME     CREATED       STATUS      IMAGE NAME
c27048bea726  fedora-toolbox-31  6 months ago  configured  registry.fedoraproject.org/f31/fedora-toolbox:31
f48d171dc79e  fedora-toolbox-32  3 months ago  running     registry.fedoraproject.org/f32/fedora-toolbox:32
...
f429c215fa02  toolbox            3 hours ago   running     registry.fedoraproject.org/f32/fedora-toolbox:32


[bam@host ~]$ podman stop fedora-toolbox-32 
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted
[bam@host ~]$ podman stop toolbox 
2020-08-26T14:33:32.000453021Z: kill process 3318: Operation not permitted
Error: operation not permitted

[bam@host ~]$ sudo podman stop toolbox 
[sudo] password for bam: 
Error: no container with name or ID toolbox found: no such container

Aug 26 '20 14:08 bam80

[bam@host ~]$ podman stop fedora-toolbox-32 
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted

The reason of the error is seems conmon subprocesses run with weird PID 100000:

bam         3315    1332  0 16:32 ?        Ssl    0:00  \_ /usr/bin/conmon --api-version 1 -c f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b -u f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb
100000      3318    3315  0 16:32 ?        Ss     0:00  |   \_ sleep +Inf
...
bam         3456    1332  0 16:33 ?        Ssl    0:00  \_ /usr/bin/conmon --api-version 1 -c f48d171dc79e0db510fa334827fa5a4693b1f952221bc916666f408d845d5b92 -u f48d171dc79e0db510fa334827fa5a4693b1f952221bc9166
100000      3459    3456  0 16:33 ?        Ss     0:00  |   \_ sleep +Inf

[bam@host ~]$ ll /var/home/bam/.local/share/containers/storage/overlay-containers/f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b/
total 4
drwx------. 3 100000 100000 4096 Aug 26 16:38 userdata

What is it? Is it an error, or it's by design? In the former case, how could I fix my containers?

Aug 26 '20 15:08 bam80

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

Aug 26 '20 15:08 debarshiray

So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last toolbox enter or toolbox run session has terminated.

(Note that stopping the container is the same thing as terminating the container's entry point process.)

The enter and run commands use POSIX signals to tell the container's entry point that a new session is about to start, or has just ended. eg., it could send SIGUSR1 for one and SIGUSR2 for the other. The entry point handles these signals and keeps a reference count of the number of active sessions. Once the counter hits zero, it terminates.

This can be implemented with Go channels, os/signal and such. Here is an example.

The downside of this is that it's not resilient against crashes in the enter and run commands. If they crash, then the second signal indicating the end of the session might not get sent.

The enter and run sessions acquire shared file locks (ie., flock --shared ...) and the entry point blocks trying to acquire an exclusive lock (ie., flock --exclusive ...) on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it can terminate.

The nice thing about this is that locks are automatically released by the kernel when a process terminates. So, even if the enter and run commands crash, the locks would get released.

Aug 26 '20 15:08 debarshiray

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

Still it doesn't, and I have no running sessions.

This sounds like a Podman bug.

If you can repeatedly reproduce this, then I'd suggest filing a Podman bug. It would be even better if you can reproduce this just with Podman commands. eg., podman create ... sleep +Inf a container, then podman start ... and so on.

However, I can kill those sleep processes with 100000 PIDs as usual user, and then the session stops. Do you have an idea where that 100000 PIDs come from?

I think those are UIDs, not PIDs.

Those UIDs look big because they are inside a user namespace.

Aug 26 '20 16:08 bam80

If you can repeatedly reproduce this

Seems I'm not. Not sure if it's good or bad :) Anyway, I have already filed the podman issue and closed it as non-reproducible. I'll reopen if I face it again: https://github.com/containers/podman/issues/7463

I think those are UIDs, not PIDs.

Those UIDs look big because they are inside a user namespace.

Of course they are UIDs, sorry.

Thanks!

Aug 26 '20 17:08 bam80

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

I believe podman stop <container> will force those session to exit.

Sep 10 '20 09:09 martymichal

So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last toolbox enter or toolbox run session has terminated.

Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.

It's less sophisticated than the other alternatives, but simpler to implement.

Dec 18 '20 16:12 debarshiray

@debarshiray I would vote for the simple approach (just always stop after exec) as long as you suppress the spurious "Error: container ... has active exec sessions, refusing to clean up: container state improper"output when container cannot be stopped.

Feb 25 '21 19:02 nanonyme

Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well. https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43 If there are zero of those IDs, then call podman stop otherwise don't.

Dec 31 '21 16:12 castedo

This is working, it finally kills all toolbox running in background

[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.service 
[Unit]
Description=Logout script
DefaultDependencies=no
Conflicts=shutdown.target
Before=basic.target shutdown.target

[Service]
Type=oneshot
ExecStop=%h/.config/systemd/user/logout.sh
RemainAfterExit=yes
TimeoutStopSec=5m

[Install]
WantedBy=basic.target

[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.sh
#!/bin/bash

# Force container to the exit state
podman container stop fedora-toolbox-36

# Failed, container in stopped state
if (( $? != 0 ))
then
	# Force it to run again
	toolbox run true
	# And stop it
	podman container stop fedora-toolbox-36
fi

Aug 30 '22 15:08 bellegarde-c

This is working, it finally kills all toolbox running in background

That's about cleaning up any active podman exec sessions when logging out, right?

If so, then that's different from this issue. This issue is about stopping the container (ie., killing the entry point) when the last podman exec session goes away during normal use, so that --force is not necessary with podman rm and the output of toolbox list is more intuitive.

I think the problem you were trying to address might have been fixed in Podman through https://github.com/containers/podman/pull/17025

Jan 18 '23 19:01 debarshiray

So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last toolbox enter or toolbox run session has terminated.

Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.

It turns out that current implementations of podman stop do stop the container (ie., the entry point gets killed) even when there are active podman exec sessions around. This negates the coreos/toolbox approach of always calling podman stop and leaving the reference counting to Podman.

Jan 18 '23 20:01 debarshiray

Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well. https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43 If there are zero of those IDs, then call podman stop otherwise don't.

Interesting. So you are doing:

podman exec -it \
  -e LANG \
  -e TERM \
  -e DISPLAY \
  --detach-keys="" \
  -e OSVIRTALIAS=$CONTAINER \
  -e debian_chroot=$CONTAINER \
  $CONTAINER \
  $COMMAND

NUM_EXEC=$(podman container inspect --format "{{len .ExecIDs}}" $CONTAINER)
if [[ $NUM_EXEC -eq 0 ]]; then
  podman stop $CONTAINER
fi

I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.

Jan 18 '23 20:01 debarshiray

I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.

Good eye! You are correct, there is that possibility. It's fair to say what I coded in cnest for this is a hack.

I've been using it for more than a year now. Still working well. But I'm only using it for "nest" containers for which I enter and exit them manually at the command line. I'm not fast enough to be able to create the race condition.

My hack might not be OK for more general uses of a container. Maybe there are single-user cases where someone has programs in the background entering/starting the container and not only entering from the command line manually.

Jan 18 '23 21:01 castedo

I've made a wrapper script to fix this issue (https://github.com/89luca89/distrobox/issues/786) when using distrobox and adapted it for toolbox, pretty simple, it gets the conmon PID on start of the container then kills it afterwards using a background script

Could easily be adapted to kill the container too when there is no more shells open

#!/usr/bin/env bash
#
# toolbox-enter-wrapper - wrapper to call the shell properly in toolbox

if [ -z "$1" ]; then
    echo "Please provide container name"
    exit 1
fi

PIDFILE_DIR="$HOME/.local/state/toolbox"
PIDFILE="$PIDFILE_DIR/$$"

mkdir -p "$PIDFILE_DIR"
touch "$PIDFILE"

nohup sh <<EOF >/dev/null 2>&1 &
# wait for the main script to end
while ps -p $$ >/dev/null; do
    sleep 1s
done

# get pid from the file
PID="\$(cat "$PIDFILE")"
rm -f "$PIDFILE"

# conmon already dead, quit
if ! ps -p "\$PID" >/dev/null; then
    exit 0
fi

# kill conmon
kill -1 "\$PID"

# quit
exit 0
EOF

toolbox run -c "$1" sh -c "echo \$PPID > $PIDFILE; exec ${2:-$SHELL}"

Jun 12 '23 19:06 sandorex

toolbox toolbox copied to clipboard

Containers remain running after exiting

toolbox
toolbox copied to clipboard