karpenter Support Cascade Delete When Removing Karpenter from my Cluster

trafficstars

Description

What problem are you trying to solve?

I'd like to be able to configure cascading delete behavior for Karpenter so that I can set values on NodePool deletion or on CRD deletion that convey to Karpenter that I want a more expedited termination of my nodes rather than waiting for all nodes to fully drain.

Right now, it's possible for nodes to hang due to stuck pods or fully blocking PDBs due to our graceful drain logic. Because a NodePool deletion or CRD deletion causes all the nodes to gracefully drain, it's also possible for these deletion operations to hang, halting the whole process. Ideally, a user could send through something like --grace-period when they are deleting a resource and Karpenter could reason about how to pass that down to all resources that the deletion cascades to.

Minimally, we should allow CRD deletions to get unblocked so that cluster operators can uninstall Karpenter from clusters without being blocked by graceful node drains that may hang.

An initial implementation of this was tried here https://github.com/kubernetes-sigs/karpenter/pull/466 and there was some discussion in the community about enabling the ability to pass gracePeriod through to CRs in the same way that you can pass them through to pods today to affect the deletionTimestamp for a CR, allowing controller authors to build custom logic around this gracePeriod concept.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Feb 22 '24 07:02 jonathan-innis

enabling the ability to pass gracePeriod through to CRs in the same way that you can pass them through to pods today to affect the deletionTimestamp for a CR

Building a coalition of supporters for this idea is effort, but may pay off really well.

Feb 26 '24 17:02 sftim

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 26 '24 17:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 25 '24 17:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jul 25 '24 18:07 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jul 25 '24 18:07 k8s-ci-robot

/reopen

Aug 01 '24 21:08 jonathan-innis

@jonathan-innis: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Aug 01 '24 21:08 k8s-ci-robot

/remove-lifecycle rotten

Aug 01 '24 21:08 jonathan-innis

/triage accepted

Aug 01 '24 21:08 jonathan-innis

Discussed this in WG today: The consensus was that folks, in general, still want the ability to have the graceful termination of their nodes -- so they don't want Karpenter to always do a forceful termination of all the nodes on their behalf. There are currently workarounds with the TerminationGracePeriod implementation that would allow users to start the teardown of Karpenter's CRDs, have the NodeClaims start to be terminated, and then have a user or automation annotate all of the nodes with the karpenter.sh/nodeclaim-termination-timestamp to mark the time that the NodeClaim has to be removed by.

In the case that you want forceful termination, you could mark the timestamp to be the current time and then everything should start forcefully removing itself, with the instances that were launched by Karpneter torn down as well.

Aug 01 '24 22:08 jonathan-innis

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Aug 01 '25 22:08 k8s-triage-robot

/triage accepted

Aug 04 '25 22:08 rschalo

karpenter karpenter copied to clipboard

Support Cascade Delete When Removing Karpenter from my Cluster

Description

karpenter
karpenter copied to clipboard