talos icon indicating copy to clipboard operation
talos copied to clipboard

External shutdown behavior

Open Davincible opened this issue 2 years ago • 3 comments

Feature Request

Streamline external shutdown behavior with talosctl shutdown

Description

I noticed that when I call talosctl shutdown, the is some some graceful shutdown behavior going on. E.g. a node is notified of its shutdown, pods are given the chance to terminate, and the node is marked as SchedulingDisabled in k8s. While when I gracefully shutdown through my hosting provider, this behavior is not showcased, and pods are marked as NodeShutdown. It would be great to run the same graceful shutdown routine on an external shutdown signal as is done when calling talosctl shutdown. Same goes for rebooting.

Davincible avatar Jun 25 '22 17:06 Davincible

This happens because the Force flag is set when shut down through ACPI.

https://github.com/siderolabs/talos/blob/2deff6b6e148d99e9c88159f4895594417cdf080/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_controller.go#L221

In my case this is not preferable since my cloud provider has a "ForceShutdown" option, which does not send any signals to the machine, just force shuts it down, if you really wanted to.

I can see how this might not be the case for every provider, perhaps a setting to change this default behavior could be a solution?

Davincible avatar Jun 25 '22 17:06 Davincible

It's a question also on the timeout on the calling side - it sends the ACPI event and sets a timeout, if Talos doesn't shutdown within the timeout, it will forcefully shut it down, which is not good either.

We need to see if we can do timeout with a smaller budget for kubelet shutdown.

smira avatar Jun 27 '22 14:06 smira

I see. Although that then differs on a caller basis. I know that in my case the timeout is very royal, and Talos would shut down well within that time window. Since Talos has no way of knowing the timeout, it would be up to the operator to decide whether to perform the extended shutdown sequence on ACPI or not.

Although improving shutdown times could never hurt ofc

Davincible avatar Jun 27 '22 17:06 Davincible