AKS
AKS copied to clipboard
Support for topology spread constraints with cluster autoscaler
When a deployment is applied using topology spread constraints with a maxSkew of 1 and topology key "topology.kubernetes.io/zone" the cluster autoscaler scales up 1 zone with too many nodes and after the scale-down-unneeded-time has passed they will be removed again. There is a nodepool for each zone (3) and balance-similar-node-groups is set to true.
I would expect nodes to be added to each zone with the similar number of nodes and not unneeded nodes extra being added which are removed again the after scale-down-unneeded-time timeout.
The issue can be reproduced by applying a deployment with resource requests sized about half the size of the nodes, about 30 pods and with topology spread constraints configured:
topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule
Cluster setup with autoscaled nodepools per zone and with balance-similar-node-groups set to true..
MS support ticket 2112070050001650 was opened for this issue. Was told there is no special integration between pod topology spreading and CA as in this this behavior is expected. Advised to open issue and request for integration of topology spread constraints.
Kubernetes 1.21.9 1 system nodepool (Standard_D16as_v4) with 3 nodes (no autoscaling) 3 user nodepools (1 per zone, Standard_D16as_v4) and cluster autoscaling (3 - 30)
Hi martin-adema, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.
I might be just a bot, but I'm told my suggestions are normally quite good, as such:
- If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
- Please abide by the AKS repo Guidelines and Code of Conduct.
- If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
- Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
- Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
- If you have a question, do take a look at our AKS FAQ. We place the most common ones there!
Triage required from @Azure/aks-pm
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
@justindavies could you help take a look?
Hi my customer has the same issue using PodAntiaffinity and the CA never triggering when pods cannot be scheduled.
To be sure, is it linked to this issue ?
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads