terraform-google-kubernetes-engine Error / issue applying kubelet config

TL;DR

Expected behavior

The config to apply

Observed behavior

  ~ resource "google_container_node_pool" "pools" {
        id                          = "projects/xxx/locations/us-central1/clusters/yyy/nodePools/primary"
        name                        = "primary"
        # (10 unchanged attributes hidden)

      ~ node_config {
            tags                        = [
                "gke-prod-cluster-01",
                "gke-prod-cluster-01-primary",
            ]
            # (17 unchanged attributes hidden)

          - kubelet_config {
              - cpu_cfs_quota  = false -> null
              - pod_pids_limit = 0 -> null
            }

            # (2 unchanged blocks hidden)
        }

This diff and then this

module.gke.google_container_node_pool.pools["primary"]: Modifying... [id=projects/xxx/locations/us-central1/clusters/yyy/nodePools/primary]
╷
│ Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning'] must be specified.
│ Details:
│ [
│   {
│     "@type": "type.googleapis.com/google.rpc.RequestInfo",
│     "requestId": "0xaf5070f5462ddf7d"
│   }
│ ]
│ , badRequest
│ 
│   with module.gke.google_container_node_pool.pools["primary"],
│   on .terraform/modules/gke/modules/private-cluster/cluster.tf line 491, in resource "google_container_node_pool" "pools":
│  491: resource "google_container_node_pool" "pools" {
│ 
╵

See further debug output below

Terraform Configuration

module "gke" {
  source                = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version               = "31.1.0"
  project_id            = var.project
  name                  = "foo-cluster-01"
  service_account_name  = "foo-cluster-01"
  grant_registry_access = true
  kubernetes_version    = "1.29.6-gke.1326000"
  release_channel       = "UNSPECIFIED"
  region                = "us-central1"
  zones = [
    data.google_compute_zones.available.names[1],
    data.google_compute_zones.available.names[2],
  ]
  network = data.terraform_remote_state.network.outputs.network_name

  subnetwork = data.terraform_remote_state.network.outputs.subnets_names[0]
  ip_range_pods     = data.terraform_remote_state.network.outputs.subnets_secondary_ranges[0][0].range_name
  ip_range_services = data.terraform_remote_state.network.outputs.subnets_secondary_ranges[0][1].range_name

  horizontal_pod_autoscaling = true
  enable_private_nodes       = true

  master_authorized_networks = local.all_allowlist_ranges
  dns_cache                  = true

  remove_default_node_pool = true
  node_pools = [
    # Note: this is intentionally different from the actual default,
    # "default-pool"
    {
      name                      = "primary"
      machine_type              = var.instance_type
      total_min_count           = var.node_pool_total_min_count
      total_max_count           = var.node_pool_total_max_count
      local_ssd_count           = 0
      spot                      = false
      local_ssd_ephemeral_count = 0
      disk_size_gb              = 100
      disk_type                 = "pd-balanced"
      image_type                = "COS_CONTAINERD"
      enable_gcfs               = false
      enable_gvnic              = false
      logging_variant           = "DEFAULT"
      auto_upgrade              = false
      preemptible               = false
      # Note: this was an attempt to resolve the permadiff; fails without it too
      pod_pids_limit            = 0
    },
  ]

  node_pools_oauth_scopes = {
    # Note: use cloud platform only, and manage monitoring etc. permissions via
    # IAM
    all = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }
}

Terraform Version

OpenTofu v1.7.2
on darwin_arm64
+ provider registry.opentofu.org/hashicorp/external v2.3.3
+ provider registry.opentofu.org/hashicorp/google v5.37.0
+ provider registry.opentofu.org/hashicorp/kubernetes v2.31.0
+ provider registry.opentofu.org/hashicorp/null v3.2.2
+ provider registry.opentofu.org/hashicorp/random v3.6.2



### Additional information

2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: PUT /v1/projects/xxx/locations/us-central1/clusters/yyyy/nodePools/primary?alt=json&prettyPrint=false HTTP/1.1 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Host: container.googleapis.com 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: User-Agent: google-api-go-client/0.5 Terraform/1.7.2 (+https://www.terraform.io) Terraform-Plugin-SDK/2.33.0 terraform-provider-google/dev blueprints/terraform/terraform-google-kubernetes-engine:private-cluster/v31.1.0 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Content-Length: 25 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Content-Type: application/json 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: X-Goog-Api-Client: gl-go/1.21.11 gdcl/0.185.0 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Accept-Encoding: gzip 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: "nodePoolId": "primary" 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: ----------------------------------------------------- 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: 2024/07/26 11:49:18 [DEBUG] Google API Response Details: 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ---[ RESPONSE ]-------------------------------------- 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: HTTP/2.0 400 Bad Request 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Cache-Control: private 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Content-Type: application/json; charset=UTF-8 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Date: Fri, 26 Jul 2024 18:49:18 GMT 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Server: ESF 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Vary: Origin 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Vary: X-Origin 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Vary: Referer 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: X-Content-Type-Options: nosniff 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: X-Frame-Options: SAMEORIGIN 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: X-Xss-Protection: 0 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "error": { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "code": 400, 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "message": "At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning'] must be specified.", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "errors": [ 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "message": "At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning'] must be specified.", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "domain": "global", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "reason": "badRequest" 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ], 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "status": "INVALID_ARGUMENT", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "details": [ 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "@type": "type.googleapis.com/google.rpc.RequestInfo", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "requestId": "0xa4a8369efaf57da0" 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ] 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ----------------------------------------------------- 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: 2024/07/26 11:49:18 [DEBUG] Retry Transport: Stopping retries, last request failed with non-retryable error: googleapi: got HTTP response code 400 with body: HTTP/2.0 400 Bad Request

Jul 26 '24 18:07 wyardley

I also am encountering this issue. I have added (in node_pools) the following values, as per the documentation:

    cpu_cfs_quota      = false
    pod_pids_limit     = 0

However, on each plan, it is ignored and Terraform wants to revert back to the default values of null:

# module.gke.google_container_node_pool.pools["default"] will be updated in-place
~ resource "google_container_node_pool" "pools" {
      id                          = "projects/redacted/nodePools/default-5a32"
      name                        = "default-5a32"
      # (11 unchanged attributes hidden)

    ~ node_config {
          tags                        = [
              "redacted",
              "redacted-default",
              "default",
          ]
          # (20 unchanged attributes hidden)

        - kubelet_config {
            - cpu_cfs_quota        = false -> null
            - pod_pids_limit       = 0 -> null
              # (2 unchanged attributes hidden)
          }

          # (3 unchanged blocks hidden)
      }

      # (5 unchanged blocks hidden)
  }

Plan: 0 to add, 1 to change, 0 to destroy.

I'm using version 31.1.0 of the private-cluster-update-variant module.

Jul 27 '24 04:07 Nickmman

Same issue here when using the private cluster module.

module.gke.google_container_node_pool.pools["default-node-pool"] will be updated in-place
  ~ resource "google_container_node_pool" "pools" {
        id                          = "projects/xxx/locations/us-east4/clusters/yyy/nodePools/default-node-pool"
        name                        = "default-node-pool"
        # (11 unchanged attributes hidden)

      ~ node_config {
            tags                        = [
                "gke-staging",
                "gke-staging-default-node-pool",
                "default-node-pool",
            ]
            # (20 unchanged attributes hidden)

          - kubelet_config {
              - cpu_cfs_quota        = false -> null
              - pod_pids_limit       = 0 -> null
                # (2 unchanged attributes hidden)
            }

            # (2 unchanged blocks hidden)
        }

        # (5 unchanged blocks hidden)
    }

I've checked the code and node_config doesn't support kubelet_config as a dynamic block. I'm using version 31.1.0 of the private-cluster module.

Edit: master works, I just replaced source by:

source = "git::https://github.com/terraform-google-modules/terraform-google-kubernetes-engine//modules/private-cluster?ref=master"

Jul 30 '24 08:07 hernan82arg

also happening for me:

Terraform v1.9.4
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v5.27.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.31.0
+ provider registry.terraform.io/hashicorp/random v3.6.2
+ provider registry.terraform.io/hashicorp/tfe v0.54.0
+ provider registry.terraform.io/hashicorp/time v0.12.0

Aug 05 '24 06:08 trenslow

Same here, after updating the google cloud provider from 5.30 to 5.42, stating to see this error with the module version 31.0, updated the module version to 32, but still failing, after adding this mentioned here, solved the issue

I also am encountering this issue. I have added (in node_pools) the following values, as per the documentation:

    cpu_cfs_quota      = false
    pod_pids_limit     = 0

However, on each plan, it is ignored and Terraform wants to revert back to the default values of null:

# module.gke.google_container_node_pool.pools["default"] will be updated in-place
~ resource "google_container_node_pool" "pools" {
      id                          = "projects/redacted/nodePools/default-5a32"
      name                        = "default-5a32"
      # (11 unchanged attributes hidden)

    ~ node_config {
          tags                        = [
              "redacted",
              "redacted-default",
              "default",
          ]
          # (20 unchanged attributes hidden)

        - kubelet_config {
            - cpu_cfs_quota        = false -> null
            - pod_pids_limit       = 0 -> null
              # (2 unchanged attributes hidden)
          }

          # (3 unchanged blocks hidden)
      }

      # (5 unchanged blocks hidden)
  }

Plan: 0 to add, 1 to change, 0 to destroy.

I'm using version 31.1.0 of the private-cluster-update-variant module.

Aug 23 '24 22:08 rekiemfaxaf

FWIW, for me, with v 32.x, the permadiff eventually shifted to a diff of cpu_manager_policy, which was easier to solve by setting it to the valid, but not documented, value of "" -- comment:

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/2013#issuecomment-2305452939

Aug 23 '24 23:08 wyardley

Also ran into this with v33.02 of private-cluster-update-variant module and TPG v5.44.0

It fails to update the cluster

Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified. 

Details: [ { "@type": "type.googleapis.com/google.rpc.RequestInfo", "requestId": "0x32be3a3a868d29d7" } ] , badRequest

Sep 13 '24 21:09 derhally

I am also running into this issue, if anyone has a workaround I would appreciate it as its currently causing issues with out deployment.

  source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  version = "~> 33.1"

  ~ resource "google_container_node_pool" "pools" {

        # (10 unchanged attributes hidden)

      ~ node_config {
            tags                        = [
            ]
            # (17 unchanged attributes hidden)

          + gcfs_config {
              + enabled = false
            }

          - kubelet_config {
              - cpu_cfs_quota                          = false -> null
              - insecure_kubelet_readonly_port_enabled = "TRUE" -> null
              - pod_pids_limit                         = 0 -> null
            }

            # (2 unchanged blocks hidden)
        }

        # (5 unchanged blocks hidden)
    }

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = ">= 6.6.0, < 7"
    }
  }
}

Oct 10 '24 13:10 ghost

Same issue here, permanent drift that fails on apply

      ~ node_config {
            tags                        = [
                "gke-gke-dr",
                "gke-gke-dr-t2d-16",
            ]
            # (20 unchanged attributes hidden)

          - kubelet_config {
              - cpu_cfs_quota                          = false -> null
              - insecure_kubelet_readonly_port_enabled = "FALSE" -> null
              - pod_pids_limit                         = 0 -> null
                # (2 unchanged attributes hidden)
            }

And the error it fails with:

│ Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.
│ Details:
│ [
│   {
│     "@type": "type.googleapis.com/google.rpc.RequestInfo",
│     "requestId": "0x15856f1dc84fc347"
│   }
│ ]
│ , badRequest

Oct 10 '24 18:10 LP0101

@LP0101 that may be related to the issue described here and here (though in your case, it's false vs. true, so maybe not related to the new default, unless the API is now sometimes, in some places, returning both true / false all the time?).

Sounds like there may be a fix coming that will hopefully help with the apply failure, though if / when #2082 ships, that should at least allow you to match what's coming back from the API better.

Oct 10 '24 18:10 wyardley

Thanks for the links @wyardley , fingers crossed for a fix soon.

It all scans too - we have an older cluster that was imported into TF, and we don't observe the issue there, likely because the older nodepools don't have the kubelet_config created on the API side

Oct 10 '24 18:10 LP0101

same for me on v33.0.3 of private-cluster submodule and provider hashicorp/google v5.42.0

Oct 10 '24 19:10 RuiSMagalhaes

The following resolved the issue for me as a workaround at least:

❯ cat /tmp/kubelet-config
kubeletConfig:
  cpuManagerPolicy: ""

gcloud container node-pools update node-pool --cluster=gke-cluster --location=europe-west2 --project=my-project --system-config-from-file=/tmp/kubelet-config

Oct 10 '24 19:10 ghost

The latest 5.X and 6.X releases, 5.44.2 and 6.7.0, both include a fix for the recent kubeletConfig issues, see https://github.com/hashicorp/terraform-provider-google/issues/19792

6.8.0 should mitigate some of the cases where this error is returned in the future, after I merge https://github.com/GoogleCloudPlatform/magic-modules/pull/11978. It won't resolve underlying issues necessarily and may just shift the error- probably to a permadiff- but will at least stop masking them.

Oct 14 '24 19:10 rileykarson

6.7.0 worked. Thanks

Oct 15 '24 08:10 michel-numan

terraform-google-kubernetes-engine terraform-google-kubernetes-engine copied to clipboard

Error / issue applying kubelet config

TL;DR

Expected behavior

Observed behavior

Terraform Configuration

Terraform Version

terraform-google-kubernetes-engine
terraform-google-kubernetes-engine copied to clipboard