AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[Feature] Allow a configurable delay between node upgrades to ensure pods have enough time to recover during upgrades.

Open tepley opened this issue 3 years ago • 23 comments
trafficstars

Is your feature request related to a problem? Please describe. Whenever we upgrade aks to a new k8s version that has a windows nodepool the upgrade replaces nodes faster than the pods can recover in. This usually results in a total complete outage of the entire cluster for 20 minutes in production, which is very impactful and hard to work around.

Describe the solution you'd like I would like a way to build in a wait between moving to a new node so that the pods on that node can recover. This really only affects windows images on windows nodes as the delay is mostly around the size of the image itself.

Describe alternatives you've considered We have reviewed the node surge upgrade features, but the default settings are already the slowest that we can see. If we bump it higher it will just be more aggressive in taking more nodes down the moment the previous nodes are healthy from a k8s perspective.

Additional context Add any other context or screenshots about the feature request here.

tepley avatar Jul 01 '22 18:07 tepley

Good to have feature, critical for workloads that require time to stabilize i.e. Zookeeper, Kafka

valencetech avatar Oct 05 '22 10:10 valencetech

I think this is a useful feature, nodes might come up, but need time to balance, process, vote, etc for anything thats clustered.

bbgobie avatar Oct 05 '22 18:10 bbgobie

As a workaround we are using pod disruption budgets in situations that need this.

tepley avatar Dec 21 '22 16:12 tepley

Action required from @Azure/aks-pm

ghost avatar Jun 24 '23 19:06 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 10 '23 00:07 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 25 '23 06:07 ghost

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads