autoscaler cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down

cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down

Open desek opened this issue 1 year ago • 7 comments

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: 1.28.2

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Client Version: v1.28.4
Server Version: v1.28.4

Also using CAPZ version: 1.10.8

What environment is this in?:

The CAPI/CAPZ Management cluster is running on Azure AKS.
The workload cluster is running in Azure.
The cluster-autoscaler is running on the Management cluster

What did you expect to happen?:

When cluster-autoscaler taints nodes for deletion it will delete them and scale down the MachinePool.

What happened instead?:

cluster-autoscaler can't find the Machines

How to reproduce it (as minimally and precisely as possible):

Assuming you have a running CAPZ cluster:

Create an AzureMachinePool with spec.orchestrationMode set to Flexible
Scale out a deployment that triggers cluster-autoscaler to increase the replica count of the MachinePool
CAPZ will create one AzureMachinePoolMachine resource per required node
Scale in a deployment that trigger cluster-autoscaler to initiate the scale-down process
cluster-autoscaler fails scale-down due not finding Machine resources

Step 5 fails due to VMSS Flex replicas are created as AzureMachinePoolMachine and not Machine.

Anything else we need to know?:

This commit adds the required resources, indexers and conditions in handlers to correctly remove unneeded AzureMachinePoolMachines: https://github.com/LiveArena/kubernetes-autoscaler/commit/b819ed9bf27722146805425ab82ea5f860c990b3

Jan 17 '24 13:01 desek

/area provider/azure /area provider/cluster-api

Jan 18 '24 05:01 Shubham82

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 19 '24 12:06 k8s-triage-robot

/remove-lifecycle stale

Jun 20 '24 06:06 Shubham82

autoscaler autoscaler copied to clipboard

cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down

autoscaler
autoscaler copied to clipboard