terraform-provider-azurerm icon indicating copy to clipboard operation
terraform-provider-azurerm copied to clipboard

`orchestrator_version` for `azurerm_kubernetes_cluster_node_pool` is empty.

Open yildizbilal opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Community Note

  • Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.2.4

AzureRM Provider Version

3.16.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster_node_pool

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks-cluster" {
  ...
  default_node_pool {
    name                 = "ap${replace(lower(var.CLUSTER_NAME), "/[^a-z0-9]/", "")}"
    node_count           = var.NODE_COUNT
    vm_size              = var.NODE_SIZE
    enable_auto_scaling  = false
    orchestrator_version = var.K8S_VERSION # <--- this is affected!
    vnet_subnet_id       = azurerm_subnet.subnet.id
    tags                 = var.TAGS
  }
...
}

resource "azurerm_kubernetes_cluster_node_pool" "additional_pool" {
  for_each = var.ADDITIONAL_POOLS
  
  ...
  
  name                  = each.value.name
  vm_size               = each.value.nodeSize
  node_count            = each.value.nodeCount
  node_labels           = each.value.nodeLabels
  os_type               = "Linux"
  orchestrator_version  = var.K8S_VERSION # <--- this is affected!
  tags                  = var.TAGS

  ...
}

Debug Output/Panic Output

`terraform plan` shows 2 changes for the given configuration above.

Expected Behaviour

When i import a azurerm_kubernetes_cluster_node_pool or azurerm_kubernetes_cluster (default node pool is also affected) the orchestrator_version should be set to the current used kubernetes version.

Actual Behaviour

When i import a azurerm_kubernetes_cluster_node_pool or azurerm_kubernetes_cluster (default node pool is also affected) the orchestrator_version (kubernetes version) in the state file is empty.

Steps to Reproduce

  1. terraform import "..azurerm_kubernetes_cluster_node_pool.mypool" /ID..../
  2. Check orchestrator_version in the state file -> "orchestrator_version" = ""

Important Factoids

Germany

References

We set the orchestrator_version of the node pools in our previous terraform configuration already. Since i upgraded the azurerm provider version to 3, terraform plans a change related to this attribute because it is empty in the state file. I tried to add the version number manually to the statefile, but the refresh of terraform removes it, so that the plan keeps the same.

yildizbilal avatar Aug 01 '22 16:08 yildizbilal

I updated azurerm-provider from 3.11.0 to 3.16.0 and terraform will add the orchestrator_version to the default_node_pool. A terraform show before displays the orchestrator_version and it's the same value as terraform will add. Strange...

hoizfux avatar Aug 04 '22 09:08 hoizfux

Hitting the same issue, when upgrading to azurerm 3.15.1, the plan shows it wants to remove the value and then add it again (which is not allowed). I initially thought azure API might be returning null for orchestrator_version when the pool is spot but on the portal the JSON view still shows the correct value. The older azurerm 3.6.0 version still works, and going by the above comment 3.11.0 should also be ok.

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the
last "terraform apply" which may have affected this plan:

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spot"] has changed
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spot"
      - orchestrator_version   = "1.21.9" -> null
        tags                   = {}
        # (25 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spotalt"] has changed
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spotalt"
      - orchestrator_version   = "1.21.9" -> null
        tags                   = {}
        # (25 unchanged attributes hidden)
    }

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spotcompute"] has changed
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spotcompute"
      - orchestrator_version   = "1.21.9" -> null
        tags                   = {}
        # (25 unchanged attributes hidden)
    }

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spotmemory"] has changed
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spotmemory"
      - orchestrator_version   = "1.21.9" -> null
        tags                   = {}
        # (25 unchanged attributes hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the
relevant attributes using ignore_changes, the following plan may include
actions to undo or respond to these changes.

─────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spot"] will be updated in-place
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spot"
      + orchestrator_version   = "1.21.9"
        tags                   = {}
        # (25 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spotalt"] will be updated in-place
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spotalt"
      + orchestrator_version   = "1.21.9"
        tags                   = {}
        # (25 unchanged attributes hidden)
    }

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spotcompute"] will be updated in-place
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spotcompute"
      + orchestrator_version   = "1.21.9"
        tags                   = {}
        # (25 unchanged attributes hidden)
    }

  # module.node-pools.azurerm_kubernetes_cluster_node_pool.spot["spotmemory"] will be updated in-place
  ~ resource "azurerm_kubernetes_cluster_node_pool" "spot" {
        id                     = "<redacted>"
        name                   = "spotmemory"
      + orchestrator_version   = "1.21.9"
        tags                   = {}
        # (25 unchanged attributes hidden)
    }

Plan: 0 to add, 4 to change, 0 to destroy.
Terraform v1.2.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.15.1
+ provider registry.terraform.io/hashicorp/random v3.3.2
+ provider registry.terraform.io/hashicorp/tls v4.0.1

daniel-anova avatar Aug 09 '22 11:08 daniel-anova

Did some additional testing and the breaking change was done in azurerm 3.12.0, 3.11.0 is the last version that works correctly.

daniel-anova avatar Aug 09 '22 11:08 daniel-anova

Was validating if azurerm 3.18 still had the issue and I can no longer reproduce the issue in any version. the only change since then was aks was upgraded to 1.23.8.

daniel-anova avatar Aug 12 '22 08:08 daniel-anova

I believe it's peculiarities of Azure API, not of the provider.

Troubleshooting

In https://github.com/hashicorp/terraform-provider-azurerm/pull/17084, I wanted to introduce support for version aliases (which let us omit patch version). Since they were supported only in a newer Azure API, I migrated the provider from 2022-01-02-preview to 2022-03-02-preview:

  • In 2022-01-02-preview, only orchestratorVersion is returned.
  • In 2022-03-02-preview, both orchestratorVersion and currentOrchestratorVersion are present.

The fields are described in details here: https://docs.microsoft.com/en-us/rest/api/aks/managed-clusters/create-or-update?tabs=HTTP. Basically, orchestratorVersion matches the version you supply through an API call (can be x.y or x.y.z), whereas currentOrchestratorVersion is always the actual version that is running in the cluster (x.y.z).

After I saw your issue and this one #17518 , I got curious what goes wrong here and made some tests with different provider versions with debug enabled.

When you run terraform commands against the same api version as was used during node pool creation, everything is fine.

When you create a node pool against 2022-01-02-preview (provider < 3.12.0), further GET calls would return this (irrelevant fields are omitted for brevity):

2022-08-25T18:35:35.220+0200 [DEBUG] provider.terraform-provider-azurerm_v3.11.0_x5: AzureRM Request: 
GET /subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster?api-version=2022-01-02-preview HTTP/1.1
Host: management.azure.com
2022-08-25T18:35:35.643+0200 [DEBUG] provider.terraform-provider-azurerm_v3.11.0_x5: AzureRM Response for https://management.azure.com/subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster?api-version=2022-01-02-preview: 
{
     "orchestratorVersion": "1.23.5",
  }: timestamp=2022-08-25T18:35:35.642+0200

Then, when you do a GET call against 2022-03-02-preview (provider >= 3.12.0), it returns this:

2022-08-25T18:48:20.258+0200 [DEBUG] provider.terraform-provider-azurerm_v3.12.0_x5: AzureRM Request: 
GET /subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster?api-version=2022-03-02-preview HTTP/1.1
2022-08-25T18:48:20.732+0200 [DEBUG] provider.terraform-provider-azurerm_v3.12.0_x5: AzureRM Response for https://management.azure.com/subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster?api-version=2022-03-02-preview: 
{
     "currentOrchestratorVersion": "1.23.5",
 }: timestamp=2022-08-25T18:48:20.732+0200

As you can see, orchestratorVersion is absent. That's why terraform plan/apply will show you that it wants to configure orchestrator_version:

# module.aks.module.aks_cluster.azurerm_kubernetes_cluster.main will be updated in-place
  ~ resource "azurerm_kubernetes_cluster" "main" {
        id                                  = "/subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster"
        name                                = "test-cluster"
        tags                                = {
            "myTag" = "myValue"
        }
        # (25 unchanged attributes hidden)

      ~ default_node_pool {
            name                         = "system"
          + orchestrator_version         = "1.23.5"
            tags                         = {
                "myTag" = "myValue"
            }
            # (19 unchanged attributes hidden)
        }

        # (5 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Now, if you run terraform apply, it'll make a PUT call against the new API:

2022-08-25T18:48:38.983+0200 [DEBUG] provider.terraform-provider-azurerm_v3.12.0_x5: AzureRM Request: 
PUT /subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster/agentPools/system?api-version=2022-03-02-preview HTTP/1.1
{"properties":{"availabilityZones":["1"],"count":1,"enableAutoScaling":false,"enableFIPS":false,"enableNodePublicIP":false,"kubeletDiskType":"OS","maxPods":110,"mode":"System","nodeLabels":{},"nodeTaints":[],"orchestratorVersion":"1.23.5","osDiskSizeGB":86,"osDiskType":"Ephemeral","osType":"Linux","tags":{"myTag":"myValue"},"type":"VirtualMachineScaleSets","upgradeSettings":{},"vmSize":"Standard_DS2_v2"}}: timestamp=2022-08-25T18:48:38.983+0200

And response will be:

2022-08-25T18:48:40.494+0200 [DEBUG] provider.terraform-provider-azurerm_v3.12.0_x5: AzureRM Response for https://management.azure.com/subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster/agentPools/system?api-version=2022-03-02-preview:
{
   "orchestratorVersion": "1.23.5",
   "currentOrchestratorVersion": "1.23.5",
  }
 }: timestamp=2022-08-25T18:48:40.494+0200

After that, Azure API starts returning both fields:

2022-08-25T18:49:10.650+0200 [DEBUG] provider.terraform-provider-azurerm_v3.12.0_x5: AzureRM Request: 
GET /subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster?api-version=2022-03-02-preview HTTP/1.1
2022-08-25T18:49:11.016+0200 [DEBUG] provider.terraform-provider-azurerm_v3.12.0_x5: AzureRM Response for https://management.azure.com/subscriptions/64842ced-4781-416f-81ff-482b7f562581/resourceGroups/aks-rg/providers/Microsoft.ContainerService/managedClusters/test-cluster?api-version=2022-03-02-preview: 
HTTP/2.0 200 OK
{
     "orchestratorVersion": "1.23.5",
     "currentOrchestratorVersion": "1.23.5",
  }
 }: timestamp=2022-08-25T18:49:11.015+0200

Workaround 1

  • terraform apply:
    • With non-Spot node pools, AKS accepts the change, does nothing, terraform state gets updated, everything's fine.
    • Spot node pools are trickier. Historically, they were not allowed to be upgraded, and it was enforced not only in Azure API, but also in terraform code. Due to the latter, I believe terraform would not let you apply the change. The upgrades are supported on Azure side since June, so I prepared a tiny patch that lifts the restriction: https://github.com/hashicorp/terraform-provider-azurerm/pull/18124. Without the patch, you'll have to destroy spot node pool first, upgrade the provider version, create the pool again.

Workaround 2

UPD: this PR https://github.com/hashicorp/terraform-provider-azurerm/pull/18130 will make the provider fallback to currentOrchestratorVersion if orchestratorVersion is missing.

weisdd avatar Aug 25 '22 17:08 weisdd

I am also seeing this on AzureRM 3.23.0.

When I upgrade my AKS cluster in my Terraform, I am seeing the same behavoir.

kingofthehill444 avatar Sep 21 '22 14:09 kingofthehill444

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Nov 10 '22 02:11 github-actions[bot]