k3s icon indicating copy to clipboard operation
k3s copied to clipboard

systemd-shutdown hangs on containerd-shim when k3s-agent running

Open sourcedelica opened this issue 5 years ago • 19 comments

Environmental Info: K3s Version: k3s version v1.18.6+k3s1 (6f56fa1d)

Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 20.04.1 Linux nuc-linux3 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 master 2 workers

Describe the bug: When shutting down or rebooting the node, the shutdown hangs for approximately 90 seconds. The console message is

systemd-shutdown: waiting for process: containerd-shim

When researching the problem I landed on this issue: https://github.com/drud/ddev/issues/2538#issuecomment-705079079 where they said when they uninstalled k3s the problem went away. I disabled and stopped k3s-agent.service and rebooted and the problem also went away for me.

I also tried re-enabling and starting k3s-agent.service and removing the docker.io package and running apt autoremove to remove containerd, runc, etc. but it still hangs on reboot at the same place.

sourcedelica avatar Oct 16 '20 22:10 sourcedelica

Following https://github.com/containerd/containerd/issues/386#issuecomment-304837687 I changed the service configuration for k3s.agent and k3s-agent.service to KillMode=Mixed and that fixed the problem. This is in the standard Docker configuration.

However, I also found https://github.com/rancher/k3s/issues/1965 where it looks like this behavior is as intended. Is there a way to allow for upgrading k3s without disrupting workloads but at the same time not hang shutdowns/reboots for 90s?

sourcedelica avatar Oct 17 '20 19:10 sourcedelica

I was thinking one way to do it is to use KillMode=mixed or KillMode=control-group by default for k3s{-agent}.service and when doing an upgrade, add a drop-in in /run/systemd/system/k3s{-agent}.service.d that temporarily sets KillMode=process before stopping the service, then removes the drop-in after the upgrade.

sourcedelica avatar Oct 18 '20 23:10 sourcedelica

systemd has an explicit pre-shutdown hook, so perhaps you could invoke special logic with that. See:

/usr/lib/systemd/system/shutdown.target.wants

dontlaugh avatar Mar 25 '21 03:03 dontlaugh

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

stale[bot] avatar Sep 21 '21 05:09 stale[bot]

Bump still relevant

unixfox avatar Sep 21 '21 05:09 unixfox

bump same issue

senpaiSubby avatar Oct 18 '21 09:10 senpaiSubby

notice the issue with raspberry pi with display on

andrewchen5678 avatar Dec 26 '21 18:12 andrewchen5678

I faced this issue yesterday and ended up with the following solution.

/etc/systemd/system/[email protected] :

[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill;'                                                                                                                                                                      
ExecStart=/bin/sleep 5                                                                                                                                                                                                                                                                     
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill -9;'
Type=oneshot
[Install]
WantedBy=shutdown.target

Enable the "service" for k3s-agent.service (will also work for k3s on the master ):

sudo systemctl enable [email protected]

# or, on the master:  sudo systemctl enable [email protected]

I've written a long winding explanation here but in brief, what happens is that since killmode=process is used, all the container processes end up staying alive when k3s is brought down. Which is a good thing :tm:

However, during shutdown, systemd will signal all remaining processes and wait for DefaultTimeoutStopSec for them to die. This is always 90s during the last shutdown phase with systemd v245.
It is a bug in systemd v245 shipped with ubuntu 20.04 and was fixed in september 2020

What I used to do was to set DefaultTimeoutStopSec=5s in /etc/systemd/system.conf and it worked fine, but on ubuntu 20.04 it doesn't.

Since there's little chance this fix will make it back into 20.04, the above "service" will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s's process cleanup during shutdown. The sleep can be tweaked to suit your services need (something matching terminationGracePeriod perhaps)

Hope it helps.

jraby avatar Jan 16 '22 02:01 jraby

Awesome research!

sourcedelica avatar Jan 16 '22 03:01 sourcedelica

@jraby your solution helped me to resolve the issue, however I ended up using the k3s-killall.sh according to the k3s docs . With this there is no shutdown delay on my system.

Caution - this may not be what you want

The killall script cleans up containers, K3s directories, and networking components while also removing the iptables chain with all the associated rules. The cluster data will not be deleted.

I'm using this /etc/systemd/system/[email protected]

# source https://github.com/k3s-io/k3s/issues/2400#issuecomment-1013798094
# $ sudo systemctl enable [email protected]
[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c "/usr/local/bin/k3s-killall.sh"
Type=oneshot
[Install]
WantedBy=shutdown.target

This is on

Linux Mint 20.2 5.4.0-91-generic

miraculixx avatar Jan 21 '22 12:01 miraculixx

the same problem exists on rke2 (no surprise, given its roots are in k3s)

horihel avatar Feb 15 '22 07:02 horihel

Yes, this is by design. Stopping the K3s (or RKE2) service does not stop running containers. This is to allow for nondisruptive upgrades of the main K3s/RKE2 components by simply replacing the binary and restarting the service.

brandond avatar Feb 15 '22 21:02 brandond

would you accept a feature request to add a systemd unit like https://github.com/k3s-io/k3s/issues/2400#issuecomment-1018472343 which only triggers on shutdown? This would both allow the intended behaviour of k3s/rke2 (seamless updates/restarts) and allow for a shutdown/reboot that's even quicker than RKE1.

here's my non-instanced version of that (for rke2):

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/bin/bash -c "/usr/local/bin/rke2-killall.sh"
Type=oneshot

[Install]
WantedBy=shutdown.target

horihel avatar Feb 16 '22 06:02 horihel

That might be a good thing to add to the documentation, for folks that want it?

brandond avatar Feb 16 '22 19:02 brandond

Confirming this behaviour to be present with:

root@k3s:~# k3s --version
k3s version v1.23.3+k3s1 (5fb370e5)
go version go1.17.5
root@k3s:~# uname -a
Linux k3s 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

ciacon avatar Feb 22 '22 13:02 ciacon

@ciacon this is not version-specific behavior. As described at https://github.com/k3s-io/k3s/issues/2400#issuecomment-1040795816 by design, pods are not stopped when the k3s process exits.

brandond avatar Feb 22 '22 22:02 brandond

in releases prior to 1.23.7 it was enough to add KillMode=mixed to /etc/systemd/system/k3s.service , and when system shutdown executed, k3s killed containers and computer was turned off imediately

For some reason unknown to me since 1.23.8 ---> up to current one 1.24.4 when doing so, it takes 90s again to shutdown system with k3s (... which is default systemctl timeout TimeoutStopUSec=1min 30s ... ), so KillMode mixed is ignored and k3s waits until timeout has passed to kill them ....

what has changed?

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=mixed

hlacikd avatar Aug 30 '22 12:08 hlacikd

Probably something related to the containerd version change? I'm not sure, since changing the KillMode isn't something we test or support. I would recommend adding another unit that runs on shutdown, as described above.

brandond avatar Aug 30 '22 19:08 brandond

Probably something related to the containerd version change? I'm not sure, since changing the KillMode isn't something we test or support. I would recommend adding another unit that runs on shutdown, as described above.

Thanks I have implemented shutdown unit as described by @horihel several days ago and so far it works great.

May I vote for adding this to official documentation @brandond ? I believe it is pretty common scenario, since k3s is ideal for edge deployments, and usually edge devices get much more shutdowns then servers usually do.

hlacikd avatar Sep 06 '22 17:09 hlacikd

Here is a k3s version of https://github.com/k3s-io/k3s/issues/2400#issuecomment-1041165341:

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/usr/local/bin/k3s-killall.sh
Type=oneshot

[Install]
WantedBy=shutdown.target

Put the file to /etc/systemd/system/shutdown-k3s.service and then enable the service using

systemctl enable shutdown-k3s.service

Also note that this service name shutdown-k3s shall not start with k3s-, otherwise the k3s-killall.sh script would try to stop it and cause problems.

MountComb avatar Nov 13 '22 02:11 MountComb

I faced this issue yesterday and ended up with the following solution.

/etc/systemd/system/[email protected] :

[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill;'                                                                                                                                                                      
ExecStart=/bin/sleep 5                                                                                                                                                                                                                                                                     
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill -9;'
Type=oneshot
[Install]
WantedBy=shutdown.target

Enable the "service" for k3s-agent.service (will also work for k3s on the master ):

sudo systemctl enable [email protected]

# or, on the master:  sudo systemctl enable [email protected]

I've written a long winding explanation here but in brief, what happens is that since killmode=process is used, all the container processes end up staying alive when k3s is brought down. Which is a good thing tm

However, during shutdown, systemd will signal all remaining processes and wait for DefaultTimeoutStopSec for them to die. This is always 90s during the last shutdown phase with systemd v245. It is a bug in systemd v245 shipped with ubuntu 20.04 and was fixed in september 2020

What I used to do was to set DefaultTimeoutStopSec=5s in /etc/systemd/system.conf and it worked fine, but on ubuntu 20.04 it doesn't.

Since there's little chance this fix will make it back into 20.04, the above "service" will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s's process cleanup during shutdown. The sleep can be tweaked to suit your services need (something matching terminationGracePeriod perhaps)

Hope it helps.

This will only work with unified cgroups though as for example I don't have /sys/fs/cgroup/unified/system.slice/ to begin with. :(

$ mount | grep group
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
none on /run/cilium/cgroupv2 type cgroup2 (rw,relatime)

Ubuntu 22.04

samip5 avatar Dec 05 '22 06:12 samip5

I had trouble getting a shutdown service to behave, but it turns out that was because I changed the [Install] section of the service and a systemctl daemon-reload if not enough to apply that change. You actually need to disable and enable the service to get systemd to update the symlinks to the new target.

damonmaria avatar Mar 09 '23 22:03 damonmaria

Yes, good catch. You will need to adapt the example for agent nodes. The server and agent use different service names.

brandond avatar Mar 09 '23 22:03 brandond

Before needs to change to k3s-agent.service on agent nodes.

Unless of course one uses k3s ansible role which names them both as k3s.service. :)

samip5 avatar Mar 10 '23 13:03 samip5

Here is a k3s version of #2400 (comment):

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/usr/local/bin/k3s-killall.sh
Type=oneshot

[Install]
WantedBy=shutdown.target

Put the file to /etc/systemd/system/shutdown-k3s.service and then enable the service using

systemctl enable shutdown-k3s.service

Also note that this service name shutdown-k3s shall not start with k3s-, otherwise the k3s-killall.sh script would try to stop it and cause problems.


Can confirm this also works if you get the message A stop job is running for libcontainer...

Make sure to drain the node before shutdown, otherwise there will be data loss.

If you use the k3s ansible role you need to extract k3s-killall.sh from https://github.com/k3s-io/k3s/blob/d9f40d4f5b4776164322035499fabedea77f5f52/install.sh#L666-L743

kub3let avatar Apr 13 '23 12:04 kub3let

Converting this issue into a discussion as this behavior is by design.

caroline-suse-rancher avatar Apr 26 '23 18:04 caroline-suse-rancher