karpenter icon indicating copy to clipboard operation
karpenter copied to clipboard

How to handle `spec.startupTaints` on Node Restart

Open jonathan-innis opened this issue 1 year ago • 6 comments

Description

Observed Behavior:

Certain DaemonSets require logic to run before pods are scheduled to make sure that the pod sandbox is properly configured to receive the pod and run the containers once the pod schedules. Cilium was the first example of this for Karpenter, where the CNI needed to be fully configured on the node and some startup processes needed to get squared away before pods could actually be bound.

If pods bound ahead of the startup logic running, they would begin to fail because the CNI wasn't fully set-up and the pod IP assignment wasn't ready. Thus, Karpenter implemented a spec.startupTaints field in its NodeClaims to ensure that pods do not schedule to nodes until the nodes are ready to receive them. DaemonSets are responsible for pulling off the startupTaints as the various startup processes complete.

This works fine on initial node boot; however, there was an issue opened in the aws/karpenter-provider-aws repo (https://github.com/aws/karpenter-provider-aws/issues/5293) that indicated that when a node restarts after the kubelet has joined and this process has already occurred, the pods will begin to fail because there will be no such ordering on node restart.

This is a difficult problem to solve in the context of just Karpenter, since Karpenter would have to be node restart aware (a difficult thing to know from just looking at the apiserver for details) since the kubelet ping is heartbeat-based. Realistically, this seems like a behavior change that we should explore in the upstream project. Most notably, what's the expected behavior when a node restarts and all of the processes on it restart and no longer have any ordering mechanism?

Expected Behavior:

Restart of nodes should allow for the same ordering mechanism that was offered on initial join. Since pods are disrupted anyways under the hood, maybe it's possible to just cleanup the pod bindings as part of the restart, evict the pods off, re-add the taints, etc.

I think we should do some exploring of the trade-offs here.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

jonathan-innis avatar Dec 19 '23 21:12 jonathan-innis

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 18 '24 21:03 k8s-triage-robot

/remove-lifecycle stale

Bryce-Soghigian avatar Mar 19 '24 05:03 Bryce-Soghigian

Note on impact: I imagine this becoming problematic for AKS as well, because we have a component that will attempt to restart, reimage then redeploy the nodes if they are not ready for too long. See AKS Node Repair.

Bryce-Soghigian avatar Mar 19 '24 05:03 Bryce-Soghigian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 17 '24 06:06 k8s-triage-robot

/remove-lifecycle stale /lifecycle frozen

jmdeal avatar Jun 17 '24 06:06 jmdeal

It seems that karpenter.sh/unregistered also has this problem? What is the problem now?

daimaxiaxie avatar Sep 14 '24 09:09 daimaxiaxie