cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Improve cluster-autoscaler integration

Open sbueringer opened this issue 4 months ago • 5 comments

While investigating an autoscaler issue we identified a few areas of improvement of the autoscaler - CAPI integration

We should look into the following areas:

  • [x] First temporary/stopgap solution for CAPI controller and autoscaler fighting over replicas during MD rollouts (see: https://github.com/kubernetes/autoscaler/issues/8494)
  • [ ] Improve behavior during Machine deletion (including Node drain etc.)
    • Today autoscaler cordons/taints/drains Nodes before triggering Machine deletion
    • This means that the CAPI Machine deletion logic is not respected (pre-drain hooks, MachineDrainRules, drain observability, ...)
    • An idea: Maybe we want to disable cordon/taint/drain in autoscaler, options:
      • Via a global flag to allow disabling drain
      • Extend the CloudProvider interface with a new Method to allow disabling drain per node group
      • Extending the GetOptions method of the NodeGroup interface to allow disabling drain per node group
  • [ ] Double-check that autoscaler does not scale up to many Machines based on pending Pods
    • Not entirely sure, but it looks like we observed autoscaler scaling up twice within 12 seconds because of just 1 pending Pod
  • [ ] Find a final solution for autoscaling during MD rollouts
    • [ ] Improve how autoscaler triggers Machine deletion (delete-machine annotation on MS-level + MD scale down is a weak/no API)

Note: The following seems to be available today:

  • Disable cordon with: --cordon-node-before-terminating=false (but disabling cordon without disabling drain seems bad)
  • Disable DeletionCandidateOfClusterAutoscaler:PreferNoSchedule taint with: --max-bulk-soft-taint-count=0
    • ToBeDeletedByClusterAutoscaler:NoSchedule taint cannot be disabled today, needs a new flag (like one of the two above) (Hack: overwrite maxConcurrentNodesTainting const to 0?)

Past Slack thread: https://kubernetes.slack.com/archives/C8TSNPY4T/p1756465597770899

sbueringer avatar Sep 12 '25 12:09 sbueringer

cc @elmiko @fabriziopandini

sbueringer avatar Sep 12 '25 12:09 sbueringer

cc @wjunott

sbueringer avatar Sep 12 '25 12:09 sbueringer

thanks @sbueringer !

elmiko avatar Sep 12 '25 13:09 elmiko

First point is in progress (@elmiko PR). Second point is actionable, but it requires work in the autoscaler - including checking of our proposed solution makes sense for the autoscaler maintainers. /help

Note: The long term solution need further discussion

fabriziopandini avatar Sep 17 '25 13:09 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

First point is in progress (@elmiko PR). Second point is actionable, but it requires work in the autoscaler - including checking of our proposed solution makes sense for the autoscaler maintainers. /help

Note: The long term solution need further discussion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 17 '25 13:09 k8s-ci-robot