cluster-api Support for DaemonSet eviction when draining nodes

(I'm not sure if this feature request is large enough to require the CAEP process. If it is please let me know.)

User Story

As a user I would like to some mechanism to have my DaemonSet pods gracefully terminated when draining nodes for deletion so that those pods can complete their shutdown process.

Detailed Description

Currently Cluster API uses the standard kubectl drain ignoring all DaemonSets (link). I would like some way to have my DaemonSet pods also gracefully terminated as part of the node deletion process.

Anything else you would like to add:

While investigating whether this is currently possible I saw that Cluster Autoscaler provides a mechanism to control DaemonSet draining. I'm planning to make use of this in the interim but it would be nice to also have the draining happen for when nodes are not drained by Cluster Autoscaler (e.g. for cluster upgrades, etc.).

I also looked into the graceful node shutdown feature but in my case the pod drain time is quite long (could be 30 minutes or longer) and I'm not sure the feature would work for such long termination times, especially in EC2. I don't think EC2 will let you stall instance termination for such a long time. It's hard to find any documentation on how long an EC2 instance can inhibit the shutdown but I did see this saying typically 10 minutes is the max.

The other thing I saw while investigating this is that Cluster API machine deletion has a pre-terminate hook. It seems like it might be possible to implement evicting DaemonSet pods by making a custom Hook Implementing Controller (HIC). Is that the preferred way to implement something like this? If so I can close this feature request and look into making the HIC.

/kind feature

Feb 17 '22 00:02 ailurarctos

Kind of sounds like https://github.com/kubernetes/kubernetes/issues/75482 :/

Feb 17 '22 00:02 kfox1111

Kind of sounds like kubernetes/kubernetes#75482 :/

Yes, I think if https://github.com/kubernetes/kubernetes/issues/75482 were implemented it could potentially be used to implement this feature request.

Feb 17 '22 01:02 ailurarctos

/milestone Next /kind proposal

Feb 17 '22 04:02 vincepri

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 18 '22 04:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 17 '22 05:06 k8s-triage-robot

/remove-lifecycle rotten

Jul 08 '22 05:07 ailurarctos

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 27 '22 20:10 k8s-triage-robot

/lifecycle frozen based on experience, we are slowly surfacing knobs for machine deletion/drain, and this falls into this category. As documented above this could require a small proposal

/help

Nov 03 '22 15:11 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/lifecycle frozen based on experience, we are slowly surfacing knobs for machine deletion/drain, and this falls into this category. As documented above this could require a small proposal

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 03 '22 15:11 k8s-ci-robot

/triage accepted

Nov 30 '22 17:11 fabriziopandini

This feature might be eventually supported with Declarative Node Maintenance: https://github.com/kubernetes/enhancements/pull/4213

Jan 10 '24 21:01 atiratree

/priority backlog

Apr 12 '24 14:04 fabriziopandini

Do we know how cluster-autoscaler implemented this feature?

In general the DaemonSet controller will add a toleration for the Unschedulable taint to all DaemonSet Pods (https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#taints-and-tolerations).

So while it's possible to evict DaemonSet Pods they will just immediately be re-created (because "cordon" basically doesn't work because of the toleration).

I would guess they maybe added a cluster-autoscaler-specific taint to the Node?

In general it would be better if evicting DaemonSet Pods would be cleanly supported in core Kubernetes first.

Sep 30 '24 12:09 sbueringer

Took a quick look at autoscalers code.

For me it looks like they don't handle that the daemonset controller schedules a new pod. They ignore that fact but seem to evict the running pods once and seem to have the race that:

if it was successful it just continues deletion (not listing pods again, just the existing ones before evicting ds pods are gone)
if it was not successful it tries again, with then maybe other pods

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/actuation/group_deletion_scheduler.go#L100-L116

Oct 01 '24 13:10 chrischdi

cluster-api cluster-api copied to clipboard

Support for DaemonSet eviction when draining nodes

Guidelines

cluster-api
cluster-api copied to clipboard