autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down

Open desek opened this issue 1 year ago • 7 comments

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: 1.28.2

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.28.4
Server Version: v1.28.4

Also using CAPZ version: 1.10.8

What environment is this in?:

  • The CAPI/CAPZ Management cluster is running on Azure AKS.
  • The workload cluster is running in Azure.
  • The cluster-autoscaler is running on the Management cluster

What did you expect to happen?:

When cluster-autoscaler taints nodes for deletion it will delete them and scale down the MachinePool.

What happened instead?:

cluster-autoscaler can't find the Machines

How to reproduce it (as minimally and precisely as possible):

Assuming you have a running CAPZ cluster:

  1. Create an AzureMachinePool with spec.orchestrationMode set to Flexible
  2. Scale out a deployment that triggers cluster-autoscaler to increase the replica count of the MachinePool
  3. CAPZ will create one AzureMachinePoolMachine resource per required node
  4. Scale in a deployment that trigger cluster-autoscaler to initiate the scale-down process
  5. cluster-autoscaler fails scale-down due not finding Machine resources

Step 5 fails due to VMSS Flex replicas are created as AzureMachinePoolMachine and not Machine.

Anything else we need to know?:

This commit adds the required resources, indexers and conditions in handlers to correctly remove unneeded AzureMachinePoolMachines: https://github.com/LiveArena/kubernetes-autoscaler/commit/b819ed9bf27722146805425ab82ea5f860c990b3

desek avatar Jan 17 '24 13:01 desek

/area provider/azure /area provider/cluster-api

Shubham82 avatar Jan 18 '24 05:01 Shubham82

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 19 '24 12:06 k8s-triage-robot

/remove-lifecycle stale

Shubham82 avatar Jun 20 '24 06:06 Shubham82