terraform-google-kubernetes-engine
terraform-google-kubernetes-engine copied to clipboard
Error / issue applying kubelet config
TL;DR
See also #2013
I'm seeing a permadrift which may or may not be related to having manually (outside of tf) enabled a kubelet config setting. I am somewhat confident that before this change, I did not have a permadiff or error applying this state.
Expected behavior
The config to apply
Observed behavior
~ resource "google_container_node_pool" "pools" {
id = "projects/xxx/locations/us-central1/clusters/yyy/nodePools/primary"
name = "primary"
# (10 unchanged attributes hidden)
~ node_config {
tags = [
"gke-prod-cluster-01",
"gke-prod-cluster-01-primary",
]
# (17 unchanged attributes hidden)
- kubelet_config {
- cpu_cfs_quota = false -> null
- pod_pids_limit = 0 -> null
}
# (2 unchanged blocks hidden)
}
This diff and then this
module.gke.google_container_node_pool.pools["primary"]: Modifying... [id=projects/xxx/locations/us-central1/clusters/yyy/nodePools/primary]
╷
│ Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning'] must be specified.
│ Details:
│ [
│ {
│ "@type": "type.googleapis.com/google.rpc.RequestInfo",
│ "requestId": "0xaf5070f5462ddf7d"
│ }
│ ]
│ , badRequest
│
│ with module.gke.google_container_node_pool.pools["primary"],
│ on .terraform/modules/gke/modules/private-cluster/cluster.tf line 491, in resource "google_container_node_pool" "pools":
│ 491: resource "google_container_node_pool" "pools" {
│
╵
See further debug output below
Terraform Configuration
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
version = "31.1.0"
project_id = var.project
name = "foo-cluster-01"
service_account_name = "foo-cluster-01"
grant_registry_access = true
kubernetes_version = "1.29.6-gke.1326000"
release_channel = "UNSPECIFIED"
region = "us-central1"
zones = [
data.google_compute_zones.available.names[1],
data.google_compute_zones.available.names[2],
]
network = data.terraform_remote_state.network.outputs.network_name
subnetwork = data.terraform_remote_state.network.outputs.subnets_names[0]
ip_range_pods = data.terraform_remote_state.network.outputs.subnets_secondary_ranges[0][0].range_name
ip_range_services = data.terraform_remote_state.network.outputs.subnets_secondary_ranges[0][1].range_name
horizontal_pod_autoscaling = true
enable_private_nodes = true
master_authorized_networks = local.all_allowlist_ranges
dns_cache = true
remove_default_node_pool = true
node_pools = [
# Note: this is intentionally different from the actual default,
# "default-pool"
{
name = "primary"
machine_type = var.instance_type
total_min_count = var.node_pool_total_min_count
total_max_count = var.node_pool_total_max_count
local_ssd_count = 0
spot = false
local_ssd_ephemeral_count = 0
disk_size_gb = 100
disk_type = "pd-balanced"
image_type = "COS_CONTAINERD"
enable_gcfs = false
enable_gvnic = false
logging_variant = "DEFAULT"
auto_upgrade = false
preemptible = false
# Note: this was an attempt to resolve the permadiff; fails without it too
pod_pids_limit = 0
},
]
node_pools_oauth_scopes = {
# Note: use cloud platform only, and manage monitoring etc. permissions via
# IAM
all = [
"https://www.googleapis.com/auth/cloud-platform",
]
}
}
Terraform Version
OpenTofu v1.7.2
on darwin_arm64
+ provider registry.opentofu.org/hashicorp/external v2.3.3
+ provider registry.opentofu.org/hashicorp/google v5.37.0
+ provider registry.opentofu.org/hashicorp/kubernetes v2.31.0
+ provider registry.opentofu.org/hashicorp/null v3.2.2
+ provider registry.opentofu.org/hashicorp/random v3.6.2
### Additional information
2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: PUT /v1/projects/xxx/locations/us-central1/clusters/yyyy/nodePools/primary?alt=json&prettyPrint=false HTTP/1.1 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Host: container.googleapis.com 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: User-Agent: google-api-go-client/0.5 Terraform/1.7.2 (+https://www.terraform.io) Terraform-Plugin-SDK/2.33.0 terraform-provider-google/dev blueprints/terraform/terraform-google-kubernetes-engine:private-cluster/v31.1.0 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Content-Length: 25 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Content-Type: application/json 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: X-Goog-Api-Client: gl-go/1.21.11 gdcl/0.185.0 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: Accept-Encoding: gzip 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: "nodePoolId": "primary" 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.375-0700 [DEBUG] provider.terraform-provider-google: ----------------------------------------------------- 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: 2024/07/26 11:49:18 [DEBUG] Google API Response Details: 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ---[ RESPONSE ]-------------------------------------- 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: HTTP/2.0 400 Bad Request 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Cache-Control: private 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Content-Type: application/json; charset=UTF-8 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Date: Fri, 26 Jul 2024 18:49:18 GMT 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Server: ESF 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Vary: Origin 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Vary: X-Origin 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: Vary: Referer 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: X-Content-Type-Options: nosniff 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: X-Frame-Options: SAMEORIGIN 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: X-Xss-Protection: 0 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "error": { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "code": 400, 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "message": "At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning'] must be specified.", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "errors": [ 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "message": "At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning'] must be specified.", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "domain": "global", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "reason": "badRequest" 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ], 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "status": "INVALID_ARGUMENT", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "details": [ 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: { 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "@type": "type.googleapis.com/google.rpc.RequestInfo", 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: "requestId": "0xa4a8369efaf57da0" 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ] 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: } 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: ----------------------------------------------------- 2024-07-26T11:49:18.780-0700 [DEBUG] provider.terraform-provider-google: 2024/07/26 11:49:18 [DEBUG] Retry Transport: Stopping retries, last request failed with non-retryable error: googleapi: got HTTP response code 400 with body: HTTP/2.0 400 Bad Request
I also am encountering this issue. I have added (in node_pools) the following values, as per the documentation:
cpu_cfs_quota = false
pod_pids_limit = 0
However, on each plan, it is ignored and Terraform wants to revert back to the default values of null:
# module.gke.google_container_node_pool.pools["default"] will be updated in-place
~ resource "google_container_node_pool" "pools" {
id = "projects/redacted/nodePools/default-5a32"
name = "default-5a32"
# (11 unchanged attributes hidden)
~ node_config {
tags = [
"redacted",
"redacted-default",
"default",
]
# (20 unchanged attributes hidden)
- kubelet_config {
- cpu_cfs_quota = false -> null
- pod_pids_limit = 0 -> null
# (2 unchanged attributes hidden)
}
# (3 unchanged blocks hidden)
}
# (5 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
I'm using version 31.1.0 of the private-cluster-update-variant module.
Same issue here when using the private cluster module.
module.gke.google_container_node_pool.pools["default-node-pool"] will be updated in-place
~ resource "google_container_node_pool" "pools" {
id = "projects/xxx/locations/us-east4/clusters/yyy/nodePools/default-node-pool"
name = "default-node-pool"
# (11 unchanged attributes hidden)
~ node_config {
tags = [
"gke-staging",
"gke-staging-default-node-pool",
"default-node-pool",
]
# (20 unchanged attributes hidden)
- kubelet_config {
- cpu_cfs_quota = false -> null
- pod_pids_limit = 0 -> null
# (2 unchanged attributes hidden)
}
# (2 unchanged blocks hidden)
}
# (5 unchanged blocks hidden)
}
I've checked the code and node_config doesn't support kubelet_config as a dynamic block.
I'm using version 31.1.0 of the private-cluster module.
Edit: master works, I just replaced source by:
source = "git::https://github.com/terraform-google-modules/terraform-google-kubernetes-engine//modules/private-cluster?ref=master"
also happening for me:
Terraform v1.9.4
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v5.27.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.31.0
+ provider registry.terraform.io/hashicorp/random v3.6.2
+ provider registry.terraform.io/hashicorp/tfe v0.54.0
+ provider registry.terraform.io/hashicorp/time v0.12.0
Same here, after updating the google cloud provider from 5.30 to 5.42, stating to see this error with the module version 31.0, updated the module version to 32, but still failing, after adding this mentioned here, solved the issue
I also am encountering this issue. I have added (in
node_pools) the following values, as per the documentation:cpu_cfs_quota = false pod_pids_limit = 0However, on each plan, it is ignored and Terraform wants to revert back to the default values of
null:# module.gke.google_container_node_pool.pools["default"] will be updated in-place ~ resource "google_container_node_pool" "pools" { id = "projects/redacted/nodePools/default-5a32" name = "default-5a32" # (11 unchanged attributes hidden) ~ node_config { tags = [ "redacted", "redacted-default", "default", ] # (20 unchanged attributes hidden) - kubelet_config { - cpu_cfs_quota = false -> null - pod_pids_limit = 0 -> null # (2 unchanged attributes hidden) } # (3 unchanged blocks hidden) } # (5 unchanged blocks hidden) } Plan: 0 to add, 1 to change, 0 to destroy.I'm using version
31.1.0of theprivate-cluster-update-variantmodule.
FWIW, for me, with v 32.x, the permadiff eventually shifted to a diff of cpu_manager_policy, which was easier to solve by setting it to the valid, but not documented, value of "" -- comment:
https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/2013#issuecomment-2305452939
Also ran into this with v33.02 of private-cluster-update-variant module and TPG v5.44.0
It fails to update the cluster
Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.
Details: [ { "@type": "type.googleapis.com/google.rpc.RequestInfo", "requestId": "0x32be3a3a868d29d7" } ] , badRequest
I am also running into this issue, if anyone has a workaround I would appreciate it as its currently causing issues with out deployment.
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
version = "~> 33.1"
~ resource "google_container_node_pool" "pools" {
# (10 unchanged attributes hidden)
~ node_config {
tags = [
]
# (17 unchanged attributes hidden)
+ gcfs_config {
+ enabled = false
}
- kubelet_config {
- cpu_cfs_quota = false -> null
- insecure_kubelet_readonly_port_enabled = "TRUE" -> null
- pod_pids_limit = 0 -> null
}
# (2 unchanged blocks hidden)
}
# (5 unchanged blocks hidden)
}
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = ">= 6.6.0, < 7"
}
}
}
Same issue here, permanent drift that fails on apply
~ node_config {
tags = [
"gke-gke-dr",
"gke-gke-dr-t2d-16",
]
# (20 unchanged attributes hidden)
- kubelet_config {
- cpu_cfs_quota = false -> null
- insecure_kubelet_readonly_port_enabled = "FALSE" -> null
- pod_pids_limit = 0 -> null
# (2 unchanged attributes hidden)
}
And the error it fails with:
│ Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.
│ Details:
│ [
│ {
│ "@type": "type.googleapis.com/google.rpc.RequestInfo",
│ "requestId": "0x15856f1dc84fc347"
│ }
│ ]
│ , badRequest
@LP0101 that may be related to the issue described here and here (though in your case, it's false vs. true, so maybe not related to the new default, unless the API is now sometimes, in some places, returning both true / false all the time?).
Sounds like there may be a fix coming that will hopefully help with the apply failure, though if / when #2082 ships, that should at least allow you to match what's coming back from the API better.
Thanks for the links @wyardley , fingers crossed for a fix soon.
It all scans too - we have an older cluster that was imported into TF, and we don't observe the issue there, likely because the older nodepools don't have the kubelet_config created on the API side
same for me on v33.0.3 of private-cluster submodule and provider hashicorp/google v5.42.0
The following resolved the issue for me as a workaround at least:
❯ cat /tmp/kubelet-config
kubeletConfig:
cpuManagerPolicy: ""
gcloud container node-pools update node-pool --cluster=gke-cluster --location=europe-west2 --project=my-project --system-config-from-file=/tmp/kubelet-config
The latest 5.X and 6.X releases, 5.44.2 and 6.7.0, both include a fix for the recent kubeletConfig issues, see https://github.com/hashicorp/terraform-provider-google/issues/19792
6.8.0 should mitigate some of the cases where this error is returned in the future, after I merge https://github.com/GoogleCloudPlatform/magic-modules/pull/11978. It won't resolve underlying issues necessarily and may just shift the error- probably to a permadiff- but will at least stop masking them.
6.7.0 worked. Thanks