Improve cluster-autoscaler integration
While investigating an autoscaler issue we identified a few areas of improvement of the autoscaler - CAPI integration
We should look into the following areas:
- [x] First temporary/stopgap solution for CAPI controller and autoscaler fighting over replicas during MD rollouts (see: https://github.com/kubernetes/autoscaler/issues/8494)
- [ ] Improve behavior during Machine deletion (including Node drain etc.)
- Today autoscaler cordons/taints/drains Nodes before triggering Machine deletion
- This means that the CAPI Machine deletion logic is not respected (pre-drain hooks, MachineDrainRules, drain observability, ...)
- An idea: Maybe we want to disable cordon/taint/drain in autoscaler, options:
- Via a global flag to allow disabling drain
- Extend the
CloudProviderinterface with a new Method to allow disabling drain per node group - Extending the
GetOptionsmethod of theNodeGroupinterface to allow disabling drain per node group
- [ ] Double-check that autoscaler does not scale up to many Machines based on pending Pods
- Not entirely sure, but it looks like we observed autoscaler scaling up twice within 12 seconds because of just 1 pending Pod
- [ ] Find a final solution for autoscaling during MD rollouts
- [ ] Improve how autoscaler triggers Machine deletion (
delete-machineannotation on MS-level + MD scale down is a weak/no API)
- [ ] Improve how autoscaler triggers Machine deletion (
Note: The following seems to be available today:
- Disable cordon with:
--cordon-node-before-terminating=false(but disabling cordon without disabling drain seems bad) - Disable
DeletionCandidateOfClusterAutoscaler:PreferNoScheduletaint with:--max-bulk-soft-taint-count=0-
ToBeDeletedByClusterAutoscaler:NoScheduletaint cannot be disabled today, needs a new flag (like one of the two above) (Hack: overwritemaxConcurrentNodesTaintingconst to 0?)
-
Past Slack thread: https://kubernetes.slack.com/archives/C8TSNPY4T/p1756465597770899
cc @elmiko @fabriziopandini
cc @wjunott
thanks @sbueringer !
First point is in progress (@elmiko PR). Second point is actionable, but it requires work in the autoscaler - including checking of our proposed solution makes sense for the autoscaler maintainers. /help
Note: The long term solution need further discussion
@fabriziopandini: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
First point is in progress (@elmiko PR). Second point is actionable, but it requires work in the autoscaler - including checking of our proposed solution makes sense for the autoscaler maintainers. /help
Note: The long term solution need further discussion
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.