azure-service-operator
azure-service-operator copied to clipboard
Support for Upgrading Node Image Version of AKS Agent Pools in ASO
Describe the current behaviour
Currently AKS ManagedCluster or ManagedClustersAgentPool does not provide a way to perform Agent Pools - Upgrade Node Image Version without upgrading Kubernetes Version.
Describe the improvement
AKS does provide API/CLI support to upgrade AKS node images without performing k8s version upgrade(references below). ASO should provide a way to do the same.
Ref: https://learn.microsoft.com/en-us/azure/aks/node-image-upgrade Ref: https://learn.microsoft.com/en-us/rest/api/aks/agent-pools/upgrade-node-image-version?view=rest-aks-2024-10-01&tabs=HTTP
Additional context
We have a use-case where we cannot allow AKS to perform these upgrades of AKS nodes images automatically. Hence we are looking for a configurable way to do the same.
It would be interesting to see what (if anything) Terraform does for this case. We don't currently support this, as it's not performed via a PUT on the AgentPool or Cluster, it's a POST on the AgentPool, which is harder to manage in a declarative fashion.
I think we could write an extension that, if the nodeImage version was changed, issues a nodeImage upgrade, but we don't currently have nodeImageVersion in the spec. Additionally, users can't pick a specific version to upgrade to, only the latest, so if we did have it in the spec it's not immediately clear what it would be set to.
Thanks @matthchr for taking a look at this.
I think we could write an extension that, if the nodeImage version was changed, issues a nodeImage upgrade
I think this behaviour should also be configurable so that user can control when NodeImage version upgrade will be triggered.
I think this behaviour should also be configurable so that user can control when NodeImage version upgrade will be triggered.
I don't think AKS provides this functionality - you can choose Kubernetes version, but you always get the latest nodeimage version for that version of Kubernetes.
@theunrepentantgeek Yes, AKS does not provide functionality of choosing nodeimage version and it will always be updated to latest version. I was referring to the configurable control over when this upgrade to latest nodeimage should be triggered.
I was referring to the configurable control over when this upgrade to latest nodeimage should be triggered.
Thanks for the clarification.
The subresource approach that @matthchr would do that - it would essentially be a job that ran once. When you want the nodeimage to be upgraded, you'd create the subresource - and its status would track the upgrade until it was complete. After that, it would be inert. To trigger another upgrade, you'd delete it and create a fresh one.
Question via Slack:
Apologies for the follow-up but do we have a way to estimate by when this feature will be implement. We have a plan to migrating and upgrading support for NodeImageVersion is something that is a requirement.
There's some discussion for supporting this out of the box via the AKS API currently, and we don't want to preempt that work by doing it unilaterally in ASO. Unfortunately this means that it probably won't be in the next ASO release of 2.14. In 2.15 we may have more understanding about what's happening with the AKS API, but we probably won't have that API released yet, nor do we yet know if it's solving exactly this problem. My guess is that something like 2.16 (Oct 2025) is where we'd know more and/or possibly be willing to just do it ourselves if it turns out that the AKS team isn't solving this problem for us.
Response copied from Slack:
Thanks for taking a look into this. Actually AKS does already provide API/CLI support to upgrade AKS node images without performing k8s version upgrade(references below). Ref: https://learn.microsoft.com/en-us/rest/api/aks/agent-pools/upgrade-node-image-version?view=rest-aks-2024-10-01&tabs=HTTP Ref: https://learn.microsoft.com/en-us/azure/aks/node-image-upgrade We want to migrate our prod clusters to ASO and this is one remaining feature gap that is preventing us from doing so. Any suggestions will be very helpful.
It would be interesting to see what (if anything) Terraform does for this case.
I couldn't find any example of this when I looked.
Marking this blocked for now as we'd like to use the upstream capability to do this rather than roll it ourselves.
@matthchr Is there a possibility of having this support in Oct ?
Also, is there a way I can follow for any updates on Azure API side on this.
Thanks