autoscaler
autoscaler copied to clipboard
cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
Component version: 1.28.2
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version Client Version: v1.28.4 Server Version: v1.28.4
Also using CAPZ version: 1.10.8
What environment is this in?:
- The CAPI/CAPZ Management cluster is running on Azure AKS.
- The workload cluster is running in Azure.
- The cluster-autoscaler is running on the Management cluster
What did you expect to happen?:
When cluster-autoscaler taints nodes for deletion it will delete them and scale down the MachinePool.
What happened instead?:
cluster-autoscaler can't find the Machines
How to reproduce it (as minimally and precisely as possible):
Assuming you have a running CAPZ cluster:
- Create an AzureMachinePool with
spec.orchestrationModeset toFlexible - Scale out a deployment that triggers cluster-autoscaler to increase the replica count of the MachinePool
- CAPZ will create one
AzureMachinePoolMachineresource per required node - Scale in a deployment that trigger cluster-autoscaler to initiate the scale-down process
- cluster-autoscaler fails scale-down due not finding
Machineresources
Step 5 fails due to VMSS Flex replicas are created as AzureMachinePoolMachine and not Machine.
Anything else we need to know?:
This commit adds the required resources, indexers and conditions in handlers to correctly remove unneeded AzureMachinePoolMachines: https://github.com/LiveArena/kubernetes-autoscaler/commit/b819ed9bf27722146805425ab82ea5f860c990b3
/area provider/azure /area provider/cluster-api
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale