Karpenter should calculate DaemonSet overhead correctly when using `karpenter.sh/initialized` and `karpenter.sh/registered`
Description
Observed Behavior:
This issue was first reported in https://github.com/aws/karpenter-provider-aws/issues/5406 (see that issue for more context). Karpenter was not calculating DaemonSet overhead correctly when a user was specifying node selection/node affinity logic for DaemonSets that referenced karpenter.sh/initialized and karpenter.sh/registered. The karpenter.sh/registered label value is added to the node once the Node has properly joined the cluster and the karpenter.sh/initialized label value is added to the Node once Karpenter deems the node to have fully initialized.
Currently, there is no logic in daemonset calculations that recognizes that that these labels will eventually be added to the node and should therefore be considered if there is node selection logic against them. The lack of this logic can lead to us undershooting the resources that DaemonSets and applications need to schedule.
Expected Behavior:
Karpenter should assume that DaemonSets that select on karpenter.sh/registered and karpenter.sh/initialized will schedule against Karpenter nodes.
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
We should also check to make sure instance type labels work here as well, like if someone does an affinity for their DaemonSet nodes like
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: Exists
Do we consider this correctly or do we not know this label exists since we are just using the NodePool representation to track DaemonSet overhead?
/assign
@jonathan-innis . Should we really handle it . since those labels are meant for internal dev purpose and daemonsets are to be scheduled on every node . so there might not be a case where daemonsets should be scheduled on karpenter managed nodes and not on other nodes right ?
since those labels are meant for internal dev purpose and daemonsets are to be scheduled on every node
You can create node selector requirements or node affinities to have DaemonSets only on certain nodes. The use-case that we saw was someone that was running an EKS cluster that had mostly Karpenter-managed capacity but was running a small amount of capacity (including Karpenter itself) on fargate nodes. They didn't want to run the daemonset pods on Fargate nodes, so they create selection logic to only schedule the DaemonSet pods against the Karpenter-provisioned pods.
I think we should try to be correct if we can here to the best of our ability. Since we know that these labels are going to eventually appear on the node, we should schedule with DaemonSet pods assuming that that is going to be the case.
Then in that case we use other label on daemonsets or any other pods to be scheduled on karpenter managed nodes instead of conditioning the labels that was for the dev purpose ? Since like this any user can use any label and we might need to handle all case . instead we could take a preventive measure to mention how to achieve the goal that user was intending exactly ? Correct me if I'm wrong thanks 😊
instead we could take a preventive measure to mention how to achieve the goal that user was intending exactly
Agree, I think it makes sense not to recommend users to go this route to select against Karpenter-managed nodes.
conditioning the labels that was for the dev purpose
I'd argue that they aren't for a dev purpose and that we should be correct if there isn't a large technical burden for us to be correct here. In this case, we just need to expect that these labels will be added to any nodes launched by NodePools when we are calculating the DaemonSet overhead.
/unassign @sadath-12
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
is anyone working on this issue ?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.