terraform-provider-kubernetes icon indicating copy to clipboard operation
terraform-provider-kubernetes copied to clipboard

Node Auto provisioning makes Kubernetes Provisioner crash

Open luigi-bitonti opened this issue 1 year ago • 2 comments

If you try to enable auto-provisioner with boot_disk_kms_key on an existing cluster, it gives the following error: image

I've noticed this error pops up when you plan a change that wants to recreate the cluster.

Terraform Version, Provider Version and Kubernetes Version

Terraform version: Terraform v1.3.5
Kubernetes provider version: v2.25.2
Kubernetes version: GKE 1.27

Affected Resource(s)

google_container_cluster

Terraform Configuration Files

# Copy-paste your Terraform configurations here.
#
# For large Terraform configs, please use a service like Dropbox and share a link to the ZIP file.
# For security, you can also encrypt the files using our GPG public key:
#    https://www.hashicorp.com/security
#
# If reproducing the bug involves modifying the config file (e.g., apply a config,
# change a value, apply the config again, see the bug), then please include both:
# * the version of the config before the change, and
# * the version of the config after the change.
  dynamic "cluster_autoscaling" {
    for_each = local.cas == null ? [] : [""]
    content {
      enabled             = true
      autoscaling_profile = var.cluster_autoscaling.autoscaling_profile
      dynamic "auto_provisioning_defaults" {
        for_each = local.cas_apd != null ? [""] : []
        content {
          boot_disk_kms_key = local.cas_apd.boot_disk_kms_key
          disk_size         = local.cas_apd.disk_size
          disk_type         = local.cas_apd.disk_type
          image_type        = local.cas_apd.image_type
          oauth_scopes      = local.cas_apd.oauth_scopes
          service_account   = local.cas_apd.service_account
          dynamic "management" {
            for_each = local.cas_apd.management != null ? [""] : []
            content {
              auto_repair  = local.cas_apd.management.auto_repair
              auto_upgrade = local.cas_apd.management.auto_upgrade
            }
          }
          dynamic "shielded_instance_config" {
            for_each = local.cas_apd.shielded_instance_config != null ? [""] : []
            content {
              enable_integrity_monitoring = (
                local.cas_apd.shielded_instance_config.integrity_monitoring
              )
              enable_secure_boot = (
                local.cas_apd.shielded_instance_config.secure_boot
              )
            }
          }
          dynamic "upgrade_settings" {
            for_each = local.cas_apd_us != null ? [""] : []
            content {
              strategy = (
                local.cas_apd_us.blue_green != null ? "BLUE_GREEN" : "SURGE"
              )
              max_surge       = try(local.cas_apd_us.surge.max, null)
              max_unavailable = try(local.cas_apd_us.surge.unavailable, null)
              dynamic "blue_green_settings" {
                for_each = local.cas_apd_us.blue_green != null ? [""] : []
                content {
                  node_pool_soak_duration = (
                    local.cas_apd_us.blue_green.node_pool_soak_duration
                  )
                  dynamic "standard_rollout_policy" {
                    for_each = (
                      local.cas_apd_us.blue_green.standard_rollout_policy != null
                      ? [""]
                      : []
                    )
                    content {
                      batch_node_count = (
                        local.cas_apd_us.blue_green.standard_rollout_policy.batch_node_count
                      )
                      batch_percentage = (
                        local.cas_apd_us.blue_green.standard_rollout_policy.batch_percentage
                      )
                      batch_soak_duration = (
                        local.cas_apd_us.blue_green.standard_rollout_policy.batch_soak_duration
                      )
                    }
                  }
                }
              }
            }
          }
        }
      }
      dynamic "resource_limits" {
        for_each = local.cas.cpu_limits != null ? [""] : []
        content {
          resource_type = "cpu"
          minimum       = local.cas.cpu_limits.min
          maximum       = local.cas.cpu_limits.max
        }
      }
      dynamic "resource_limits" {
        for_each = local.cas.mem_limits != null ? [""] : []
        content {
          resource_type = "memory"
          minimum       = local.cas.mem_limits.min
          maximum       = local.cas.mem_limits.max
        }
      }
      dynamic "resource_limits" {
        for_each = (
          try(local.cas.gpu_resources, null) == null
          ? []
          : local.cas.gpu_resources
        )
        iterator = gpu_resources
        content {
          resource_type = gpu_resources.value.resource_type
          minimum       = gpu_resources.value.min
          maximum       = gpu_resources.value.max
        }
      }
    }
  }

Debug Output

Panic Output

Steps to Reproduce

Expected Behavior

Node provisioner should be re-created.

Actual Behavior

Terraform raises the error in the figure

Important Factoids

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

luigi-bitonti avatar Jan 25 '24 08:01 luigi-bitonti

Hi @luigi-bitonti – thanks for opening an issue. The google_container_cluster resource is actually part of the google provider not this provider project. I would suggest you reopen your issue on their issue tracker.

I would also suggest that if you are using the Kubernetes provider in conjunction with the google_container_cluster resource that you split your config so that Kubernetes resources are created by their own terraform apply operation.

jrhouston avatar Jan 31 '24 05:01 jrhouston

Hi @jrhouston, I opened the same issue on Google Provider and they redirected me here: https://github.com/hashicorp/terraform-provider-google/issues/17077

Thanks for the suggestion, but for my architecture I have to do it in a single apply.

luigi-bitonti avatar Jan 31 '24 09:01 luigi-bitonti