AKS icon indicating copy to clipboard operation
AKS copied to clipboard

Allow updating node pools using ARM template

Open aelij opened this issue 4 years ago • 49 comments

What happened: Currently the only option to add a node pool in an ARM template is by creating a separate child resource (.../providers/Microsoft.ContainerService/managedClusters/aks1/agentPools/p2). This presents a problem when trying to apply an update to the primary (system) node pool which requires recreating it, for example, to allow it to join an existing subnet or enable "encryption at host".

If we were to add the new agent pool using a child resource in the template, the template will no longer be idempotent (i.e. it won't be able to deploy a new clean environment) and also the template would not clean up the old pool. It forces us to use scripts to complement ARM.

ARM deployments were made to be idempotent and this essentially breaks it.

Update: Another non-idempotent related issue:

Code: OperationNotAllowed Message: Updating Kubernetes version and agent node scaling are mutually exclusive operations.

AKS should be able to handle these kind of updates on its own.

What you expected to happen: Allow updating node pools using the agentPoolProfiles array of the managedClusters type.

Environment:

  • Kubernetes version (use kubectl version): v1.18.14
  • Size of cluster (how many worker nodes are in the cluster?) 5
  • General description of workloads in the cluster: HTTP microservices

aelij avatar Mar 14 '21 13:03 aelij

Hi aelij, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost avatar Mar 14 '21 13:03 ghost

Triage required from @Azure/aks-pm

ghost avatar Mar 16 '21 18:03 ghost

Action required from @Azure/aks-pm

ghost avatar Mar 21 '21 19:03 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Apr 06 '21 00:04 ghost

I agree this really needs to be done. Put the idea out on the feedback site. Any way we could get an ETA or find out if this is even planned?

https://feedback.azure.com/forums/914020-azure-kubernetes-service-aks/suggestions/43214700-enable-nodepool-image-updates-via-arm

kelly-brown avatar Apr 19 '21 18:04 kelly-brown

Issue needing attention of @Azure/aks-leads

ghost avatar May 05 '21 00:05 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar May 20 '21 06:05 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jun 04 '21 12:06 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jun 19 '21 18:06 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 05 '21 00:07 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 20 '21 06:07 ghost

@palma21 can you look into this one?

danimal521 avatar Aug 03 '21 17:08 danimal521

Triage required from @Azure/aks-pm @palma21

ghost avatar Aug 05 '21 18:08 ghost

Action required from @palma21, @justindavies, @yizhang4321.

ghost avatar Sep 02 '21 17:09 ghost

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

ghost avatar Nov 03 '21 02:11 ghost

Not stale :)

aelij avatar Nov 03 '21 08:11 aelij

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

ghost avatar Jan 02 '22 14:01 ghost

Still not stale

aelij avatar Jan 02 '22 14:01 aelij

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

ghost avatar Mar 03 '22 20:03 ghost

Not stale

aelij avatar Mar 04 '22 13:03 aelij

This also affects the azurerm terraform provider. Seems to be an antipattern to have the nodepools as a separate resource rather than a block inside the cluster's resource definition block.

tspearconquest avatar Apr 21 '22 21:04 tspearconquest

@tspearconquest I don't think the problem is that it's a child resource, it's the duality - it's both a child resource and an array property. And you have to keep them in sync. If I could choose, I'd go for something that allows creating a cluster with NO node pools and add the pools exclusively using child resources. This allows for more granular updates.

aelij avatar Apr 24 '22 06:04 aelij

Apologies, I didn't realize there was a nuance there that I was missing in my phrasing. Thank you.

On the Terraform side, there is a "default_node_pool" block which is part of the parent resource. Trying to do anything to that nodepool in terraform code, or making any changes to it manually, and then running terraform, can end up destroying and recreating your cluster. This concept of a "default" node pool doesn't seem to be defined anywhere in Azure, while conversely "system" and "user" node pools are well known to me. I agree with you regarding creating a cluster with no nodepools, which is why I brought this up.

It definitely seems strange to have a default nodepool definition in the cluster resource definition (for terraform) in the first place, given that there is no "default" in Azure. In fact, I was just able to delete a default nodepool using azure-cli the other day and re-link a secondary nodepool I had setup as the "default" nodepool in my terraform state file without incurring any downtime.

Of course, that's why I'm here. It was an all day affair take care of it, all in order to change 2 settings that can only be modified when a nodepool (or cluster) is created.

tspearconquest avatar Apr 24 '22 06:04 tspearconquest

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

ghost avatar Aug 20 '22 14:08 ghost

Not stale

aelij avatar Aug 21 '22 05:08 aelij

Not stale

acortelyou avatar Sep 09 '22 18:09 acortelyou

@palma21 can you look into this one?

@palma21 any chance we could have some response for this issue?

stack111 avatar Sep 28 '22 21:09 stack111

@kaarthis would this be something for the upgrade node pool scenario? It would certainly make upgrading easier.

denniszielke avatar Nov 14 '22 18:11 denniszielke

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

ghost avatar Jan 13 '23 20:01 ghost

Not stale.

fschmied avatar Jan 15 '23 14:01 fschmied