azure-service-operator icon indicating copy to clipboard operation
azure-service-operator copied to clipboard

Support for Upgrading Node Image Version of AKS Agent Pools in ASO

Open nishant221 opened this issue 7 months ago • 10 comments

Describe the current behaviour

Currently AKS ManagedCluster or ManagedClustersAgentPool does not provide a way to perform Agent Pools - Upgrade Node Image Version without upgrading Kubernetes Version.

Describe the improvement

AKS does provide API/CLI support to upgrade AKS node images without performing k8s version upgrade(references below). ASO should provide a way to do the same.

Ref: https://learn.microsoft.com/en-us/azure/aks/node-image-upgrade Ref: https://learn.microsoft.com/en-us/rest/api/aks/agent-pools/upgrade-node-image-version?view=rest-aks-2024-10-01&tabs=HTTP

Additional context

We have a use-case where we cannot allow AKS to perform these upgrades of AKS nodes images automatically. Hence we are looking for a configurable way to do the same.

nishant221 avatar Apr 26 '25 03:04 nishant221

It would be interesting to see what (if anything) Terraform does for this case. We don't currently support this, as it's not performed via a PUT on the AgentPool or Cluster, it's a POST on the AgentPool, which is harder to manage in a declarative fashion.

I think we could write an extension that, if the nodeImage version was changed, issues a nodeImage upgrade, but we don't currently have nodeImageVersion in the spec. Additionally, users can't pick a specific version to upgrade to, only the latest, so if we did have it in the spec it's not immediately clear what it would be set to.

matthchr avatar Apr 28 '25 22:04 matthchr

Thanks @matthchr for taking a look at this.

I think we could write an extension that, if the nodeImage version was changed, issues a nodeImage upgrade

I think this behaviour should also be configurable so that user can control when NodeImage version upgrade will be triggered.

nishant221 avatar Apr 29 '25 05:04 nishant221

I think this behaviour should also be configurable so that user can control when NodeImage version upgrade will be triggered.

I don't think AKS provides this functionality - you can choose Kubernetes version, but you always get the latest nodeimage version for that version of Kubernetes.

theunrepentantgeek avatar May 01 '25 00:05 theunrepentantgeek

@theunrepentantgeek Yes, AKS does not provide functionality of choosing nodeimage version and it will always be updated to latest version. I was referring to the configurable control over when this upgrade to latest nodeimage should be triggered.

nishant221 avatar May 01 '25 03:05 nishant221

I was referring to the configurable control over when this upgrade to latest nodeimage should be triggered.

Thanks for the clarification.

The subresource approach that @matthchr would do that - it would essentially be a job that ran once. When you want the nodeimage to be upgraded, you'd create the subresource - and its status would track the upgrade until it was complete. After that, it would be inert. To trigger another upgrade, you'd delete it and create a fresh one.

theunrepentantgeek avatar May 04 '25 22:05 theunrepentantgeek

Question via Slack:

Apologies for the follow-up but do we have a way to estimate by when this feature will be implement. We have a plan to migrating and upgrading support for NodeImageVersion is something that is a requirement.

theunrepentantgeek avatar May 23 '25 03:05 theunrepentantgeek

There's some discussion for supporting this out of the box via the AKS API currently, and we don't want to preempt that work by doing it unilaterally in ASO. Unfortunately this means that it probably won't be in the next ASO release of 2.14. In 2.15 we may have more understanding about what's happening with the AKS API, but we probably won't have that API released yet, nor do we yet know if it's solving exactly this problem. My guess is that something like 2.16 (Oct 2025) is where we'd know more and/or possibly be willing to just do it ourselves if it turns out that the AKS team isn't solving this problem for us.

matthchr avatar Jun 02 '25 22:06 matthchr

Response copied from Slack:

Thanks for taking a look into this. Actually AKS does already provide API/CLI support to upgrade AKS node images without performing k8s version upgrade(references below). Ref: https://learn.microsoft.com/en-us/rest/api/aks/agent-pools/upgrade-node-image-version?view=rest-aks-2024-10-01&tabs=HTTP Ref: https://learn.microsoft.com/en-us/azure/aks/node-image-upgrade We want to migrate our prod clusters to ASO and this is one remaining feature gap that is preventing us from doing so. Any suggestions will be very helpful.

theunrepentantgeek avatar Jun 06 '25 05:06 theunrepentantgeek

It would be interesting to see what (if anything) Terraform does for this case.

I couldn't find any example of this when I looked.

matthchr avatar Jun 09 '25 22:06 matthchr

Marking this blocked for now as we'd like to use the upstream capability to do this rather than roll it ourselves.

matthchr avatar Jun 24 '25 00:06 matthchr

@matthchr Is there a possibility of having this support in Oct ?

Also, is there a way I can follow for any updates on Azure API side on this.

Thanks

nishant221 avatar Sep 12 '25 11:09 nishant221