AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[Question] AKS Update Rollout Policy

Open siegenthalerroger opened this issue 2 years ago • 17 comments

Describe scenario

We run three clusters (DEV, QA and PROD), and were wondering a few things regarding the rollout of AKS updates. By this I mean updates to the managed services, not kubernetes or the nodes as these are covered elsewhere (and are user-manageable).

Concretely, the loadbalancer health probe changes for k8s 1.24+ are a point of contention where we were left wondering when this change would roll out, as we experienced downtime due to this change (luckily only in the DEV cluster).

Question

  1. Does Azure apply AKS updates (loadbalancer changes, addon updates, etc.) whenever it chooses to, or are changes only made when some other update is triggered by the user (i.e. a node upgrade)?
  2. Does Azure respect the maintenance windows for a cluster when updating dependent resources (i.e. the loadbalancer)?
  3. Is there any way to see which AKS Release is being applied to a cluster? I know there's the rollout status viewer, but that is woefully underwhelming and nigh useless due to inaccuracies in the real state.
  4. Is there a way to define that a new AKS release should be applied to a certain cluster, before others (i.e. first to QA before PROD)?

siegenthalerroger avatar Sep 22 '22 08:09 siegenthalerroger

After having Microsoft Support get back to me it's been verified that AKS really will deploy breaking changes to running clusters, with little to no warning. This is quite crazy behaviour IMO, the "planned maintenance" preview kinda helps here but not entirely so I am left wondering why this decision was made...

siegenthalerroger avatar Oct 03 '22 09:10 siegenthalerroger

Action required from @Azure/aks-pm

ghost avatar Nov 02 '22 16:11 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Nov 17 '22 18:11 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Dec 02 '22 18:12 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Dec 18 '22 00:12 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jan 02 '23 06:01 ghost

FWIW—a security patch was applied to our nodes which were then rebooted, all without any warning whatsoever. Support has been unable to point us at any resource (web page, service health alert, etc) that would have told us it was coming, or even what happened after the fact. So I'm not sure that updates to the nodes are entirely user-manageable, either.

dhduvall avatar Jan 12 '23 19:01 dhduvall

Issue needing attention of @Azure/aks-leads

ghost avatar Jan 28 '23 00:01 ghost

Hi @siegenthalerroger and @dhduvall,

Tagging @kaarthis for viz

Thanks for reaching out. We appreciate your concerns. We will take this back to the team.

qpetraroia avatar Feb 03 '23 19:02 qpetraroia

Action required from @Azure/aks-pm

ghost avatar Aug 08 '23 01:08 ghost

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

AKS Fleet Manager kinda solves these questions so I guess we're good. Would have been nice to get a comment from the team though.

siegenthalerroger avatar Jul 04 '24 22:07 siegenthalerroger