terraform-google-kubernetes-engine icon indicating copy to clipboard operation
terraform-google-kubernetes-engine copied to clipboard

Upgrading to v21.0.0 forces recreation of node pools

Open mkjmdski opened this issue 3 years ago • 5 comments

TL;DR

enable_gcfs being added to state forces terraform to rebuild node pools

Expected behavior

Updating version of module should not rebuild node pools

Observed behavior

Terraform needs to rebuild the node pools

Terraform Configuration

source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  version = "21.0.0"

Terraform Version

Terraform v1.1.9
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v4.22.0
+ provider registry.terraform.io/hashicorp/google-beta v4.22.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.11.0
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/random v3.2.0

Additional information

No response

mkjmdski avatar May 25 '22 12:05 mkjmdski

Thanks for the report @mkjmdski This will be fixed in 21.1.0 https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/1251 Could you temporarily try with the main branch?

bharathkkb avatar May 25 '22 14:05 bharathkkb

Hi, I have the same issue but caused by keepers that were added in 21.0.0 https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/1218

Having node pool without enable_gcfs specified - thus defaulting to "" adds this field to keepers which triggers node pool recreation.

This happens while upgrading from v20.0.0 to v21.1.0 while using beta-private-cluster-update-variant module.

# module.cluster.module.gke.module.gke.random_id.name["pool2"] must be replaced
+/- resource "random_id" "name" {
      ~ b64_std     = "pool2-skU=" -> (known after apply)
      ~ b64_url     = "pool2-skU" -> (known after apply)
      ~ dec         = "pool2-45637" -> (known after apply)
      ~ hex         = "pool2-b245" -> (known after apply)
      ~ id          = "skU" -> (known after apply)
      ~ keepers     = { # forces replacement
          + "enable_gcfs"       = ""
            # (15 unchanged elements hidden)
        }
        # (2 unchanged attributes hidden)
    }

I am not sure if I can somehow override this or what the proper fix is here.

Flektoma avatar Jun 16 '22 15:06 Flektoma

Hi, it seems that there is no other way how to avoid node pool recreation than updating the state file manually.

That's why @Flektoma and I developed this following jqcommand:

jq -a '(.resources[] | select((.module // "" | endswith("module.gke.module.gke")) and (.type == "random_id")) | .instances[].attributes.keepers) |= (. + {enable_gcfs: ""})' default.tfstate > default.tfstate.new

Just please check the diff before uploading the state file back to your backend if all the changes are valid.

gorge511 avatar Jun 17 '22 17:06 gorge511

@Flektoma Unfortunately for the update variant I don't think there is a way to this natively other than editing the state to add the new keeper attribute. @gorge511 Would you like to add this to our upgrade guide for future users who stumble on this?

bharathkkb avatar Jun 21 '22 22:06 bharathkkb

@gorge511 Would you like to add this to our upgrade guide for future users who stumble on this?

@bharathkkb can you please suggest to which file I should put it? Basically where you, as a user, will try to look for such information? New file in /docs folder. I see this more to be part of some troubleshooting guide (but I didn't find any). Because it is not specific to any module version upgrade. It will be a reoccurring issue. It's there again with version 21.2.0 and a new keeper for the enable_secure_boot variable added in #1277.

gorge511 avatar Jun 22 '22 09:06 gorge511