toolbox
toolbox copied to clipboard
Containers remain running after exiting
In particular, this means it's impossible to remove a toolbox-created container without first stopping/killing it with podman:
~/toolbox ./toolbox create -c test
Created container: test
Enter with: toolbox enter --container test
~/toolbox ./toolbox enter -c test
🔹[exalm@toolbox toolbox]$ logout
~/toolbox ./toolbox rm test
toolbox: failed to remove container test
~/toolbox podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1a08f09ea797 localhost/fedora-toolbox-exalm:30 sleep +Inf 16 seconds ago Up 10 seconds ago test
~/toolbox podman stop test
1a08f09ea79710801859bea8dc6a5a85d2031ce1a73dd7d284c3e1fa51a67be0
~/toolbox ./toolbox rm test
~/toolbox
toolbox rm --force should also work.
But yes, I'd like to make this properly reference counted, but sadly, I don't know of a way to implement that using the existing Podman command line interface.
Using toolbox rm --force is not able to delete the container when it is running on my machine.
Perhaps what would could be done is to detect if the container is running when deleting it and then give the user a prompt such as This toolbox is currently running, are you sure you wish to delete it [y/N]: Then call podman stop before calling the delete command if they choose to continue.
I am unable to delete a container created by toolbox even after using toolbox rm --force and stopping it with podman. I am having to reboot and then toolbox rm works.
I am unable to delete a container created by toolbox even after using
toolbox rm --forceand stopping it with podman. I am having to reboot and thentoolbox rmworks.
It will fail if you have currently active toolbox enter sessions. We need to improve the error handling there.
Otherwise, if that's not the case and you can reproduce at will, then I suggest trying podman rm --force <container> to delete the container. If that also fails, then we might have a Podman bug. In any case, let's use a different issue to discuss this.
Thanks for stopping by!
Considering it was reported almost 1.5 years ago, I'm wondering if there was any progress since then.
Seems I can't stop the containers left after Toolbox:
[bam@host ~]$ toolbox list
IMAGE ID IMAGE NAME CREATED
a198bc8c3cda registry.fedoraproject.org/f31/fedora-toolbox:31 9 months ago
fe7b8c2393f9 registry.fedoraproject.org/f32/fedora-toolbox:32 4 months ago
3864bc58ab7b registry.fedoraproject.org/f33/fedora-toolbox:33 4 months ago
b390f0663e2a registry.fedoraproject.org/f33/fedora-toolbox:latest 2 weeks ago
CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
c27048bea726 fedora-toolbox-31 6 months ago configured registry.fedoraproject.org/f31/fedora-toolbox:31
f48d171dc79e fedora-toolbox-32 3 months ago running registry.fedoraproject.org/f32/fedora-toolbox:32
...
f429c215fa02 toolbox 3 hours ago running registry.fedoraproject.org/f32/fedora-toolbox:32
[bam@host ~]$ podman stop fedora-toolbox-32
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted
[bam@host ~]$ podman stop toolbox
2020-08-26T14:33:32.000453021Z: kill process 3318: Operation not permitted
Error: operation not permitted
[bam@host ~]$ sudo podman stop toolbox
[sudo] password for bam:
Error: no container with name or ID toolbox found: no such container
[bam@host ~]$ podman stop fedora-toolbox-32 2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted Error: operation not permitted
The reason of the error is seems conmon subprocesses run with weird PID 100000:
bam 3315 1332 0 16:32 ? Ssl 0:00 \_ /usr/bin/conmon --api-version 1 -c f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b -u f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb
100000 3318 3315 0 16:32 ? Ss 0:00 | \_ sleep +Inf
...
bam 3456 1332 0 16:33 ? Ssl 0:00 \_ /usr/bin/conmon --api-version 1 -c f48d171dc79e0db510fa334827fa5a4693b1f952221bc916666f408d845d5b92 -u f48d171dc79e0db510fa334827fa5a4693b1f952221bc9166
100000 3459 3456 0 16:33 ? Ss 0:00 | \_ sleep +Inf
[bam@host ~]$ ll /var/home/bam/.local/share/containers/storage/overlay-containers/f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b/
total 4
drwx------. 3 100000 100000 4096 Aug 26 16:38 userdata
What is it? Is it an error, or it's by design? In the former case, how could I fix my containers?
podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.
So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last toolbox enter or toolbox run session has terminated.
(Note that stopping the container is the same thing as terminating the container's entry point process.)
- The
enterandruncommands use POSIX signals to tell the container's entry point that a new session is about to start, or has just ended. eg., it could sendSIGUSR1for one andSIGUSR2for the other. The entry point handles these signals and keeps a reference count of the number of active sessions. Once the counter hits zero, it terminates.
This can be implemented with Go channels, os/signal and such. Here is an example.
The downside of this is that it's not resilient against crashes in the enter and run commands. If they crash, then the second signal indicating the end of the session might not get sent.
- The
enterandrunsessions acquire shared file locks (ie.,flock --shared ...) and the entry point blocks trying to acquire an exclusive lock (ie.,flock --exclusive ...) on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it can terminate.
The nice thing about this is that locks are automatically released by the kernel when a process terminates. So, even if the enter and run commands crash, the locks would get released.
podman stop <container>should definitely work, unless you have activetoolbox enterorpodman runsessions.Still it doesn't, and I have no running sessions.
This sounds like a Podman bug.
If you can repeatedly reproduce this, then I'd suggest filing a Podman bug. It would be even better if you can reproduce this just with Podman commands. eg., podman create ... sleep +Inf a container, then podman start ... and so on.
However, I can kill those sleep processes with
100000PIDs as usual user, and then the session stops. Do you have an idea where that100000PIDs come from?
I think those are UIDs, not PIDs.
Those UIDs look big because they are inside a user namespace.
If you can repeatedly reproduce this
Seems I'm not. Not sure if it's good or bad :) Anyway, I have already filed the podman issue and closed it as non-reproducible. I'll reopen if I face it again: https://github.com/containers/podman/issues/7463
I think those are UIDs, not PIDs.
Those UIDs look big because they are inside a user namespace.
Of course they are UIDs, sorry.
Thanks!
podman stop <container>should definitely work, unless you have activetoolbox enterorpodman runsessions.
I believe podman stop <container> will force those session to exit.
So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last
toolbox enterortoolbox runsession has terminated.
Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.
It's less sophisticated than the other alternatives, but simpler to implement.
@debarshiray I would vote for the simple approach (just always stop after exec) as long as you suppress the spurious "Error: container ... has active exec sessions, refusing to clean up: container state improper"output when container cannot be stopped.
Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well.
https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43
If there are zero of those IDs, then call podman stop otherwise don't.
This is working, it finally kills all toolbox running in background
[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.service
[Unit]
Description=Logout script
DefaultDependencies=no
Conflicts=shutdown.target
Before=basic.target shutdown.target
[Service]
Type=oneshot
ExecStop=%h/.config/systemd/user/logout.sh
RemainAfterExit=yes
TimeoutStopSec=5m
[Install]
WantedBy=basic.target
[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.sh
#!/bin/bash
# Force container to the exit state
podman container stop fedora-toolbox-36
# Failed, container in stopped state
if (( $? != 0 ))
then
# Force it to run again
toolbox run true
# And stop it
podman container stop fedora-toolbox-36
fi
This is working, it finally kills all toolbox running in background
That's about cleaning up any active podman exec sessions when logging out, right?
If so, then that's different from this issue. This issue is about stopping the container (ie., killing the entry point) when the last podman exec session goes away during normal use, so that --force is not necessary with podman rm and the output of toolbox list is more intuitive.
I think the problem you were trying to address might have been fixed in Podman through https://github.com/containers/podman/pull/17025
So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last
toolbox enterortoolbox runsession has terminated.Another option, used by coreos/toolbox, is to call
podman stopafter every invocation ofpodman execintoolbox enter.podman stopwill keep failing as long there's any activepodman execsession, but once the last one finishes, the container will get stopped.
It turns out that current implementations of podman stop do stop the container (ie., the entry point gets killed) even when there are active podman exec sessions around. This negates the coreos/toolbox approach of always calling podman stop and leaving the reference counting to Podman.
Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well. https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43 If there are zero of those IDs, then call
podman stopotherwise don't.
Interesting. So you are doing:
podman exec -it \
-e LANG \
-e TERM \
-e DISPLAY \
--detach-keys="" \
-e OSVIRTALIAS=$CONTAINER \
-e debian_chroot=$CONTAINER \
$CONTAINER \
$COMMAND
NUM_EXEC=$(podman container inspect --format "{{len .ExecIDs}}" $CONTAINER)
if [[ $NUM_EXEC -eq 0 ]]; then
podman stop $CONTAINER
fi
I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.
I am worried that there's a race. A
podman startagainst the same container from a different terminal can slip in between thepodman inspectand thepodman start.
Good eye! You are correct, there is that possibility. It's fair to say what I coded in cnest for this is a hack.
I've been using it for more than a year now. Still working well. But I'm only using it for "nest" containers for which I enter and exit them manually at the command line. I'm not fast enough to be able to create the race condition.
My hack might not be OK for more general uses of a container. Maybe there are single-user cases where someone has programs in the background entering/starting the container and not only entering from the command line manually.
I've made a wrapper script to fix this issue (https://github.com/89luca89/distrobox/issues/786) when using distrobox and adapted it for toolbox, pretty simple, it gets the conmon PID on start of the container then kills it afterwards using a background script
Could easily be adapted to kill the container too when there is no more shells open
#!/usr/bin/env bash
#
# toolbox-enter-wrapper - wrapper to call the shell properly in toolbox
if [ -z "$1" ]; then
echo "Please provide container name"
exit 1
fi
PIDFILE_DIR="$HOME/.local/state/toolbox"
PIDFILE="$PIDFILE_DIR/$$"
mkdir -p "$PIDFILE_DIR"
touch "$PIDFILE"
nohup sh <<EOF >/dev/null 2>&1 &
# wait for the main script to end
while ps -p $$ >/dev/null; do
sleep 1s
done
# get pid from the file
PID="\$(cat "$PIDFILE")"
rm -f "$PIDFILE"
# conmon already dead, quit
if ! ps -p "\$PID" >/dev/null; then
exit 0
fi
# kill conmon
kill -1 "\$PID"
# quit
exit 0
EOF
toolbox run -c "$1" sh -c "echo \$PPID > $PIDFILE; exec ${2:-$SHELL}"