terraform-provider-kubernetes
terraform-provider-kubernetes copied to clipboard
Unable to create k8s service account for Workload Identity Federation on a GKE private cluster
Terraform version, Kubernetes provider version and Kubernetes version
Terraform version: 1.8.5
Kubernetes Provider version: 2.31.0
Google Cloud Provider version: 5.34.0
Kubernetes version: 1.28.9-gke.1209000
Terraform configuration
resource "google_container_cluster" "my-cluster" {
project = var.GCP_PROJECT_ID
name = "my-cluster"
location = "europe-west8-a"
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
network = google_compute_network.network.name
subnetwork = google_compute_subnetwork.network_subnet.name
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = true
master_ipv4_cidr_block = "172.16.0.32/28"
}
ip_allocation_policy {
}
master_authorized_networks_config {
}
workload_identity_config {
workload_pool = "${var.GCP_PROJECT_ID}.svc.id.goog"
}
logging_config {
enable_components = [
"SYSTEM_COMPONENTS",
"APISERVER",
"WORKLOADS"
]
}
}
resource "google_container_node_pool" "my-nodes" {
name = "my-node-pool"
location = "europe-west8-a"
cluster = google_container_cluster.my-cluster.name
node_count = 1
node_config {
preemptible = true
machine_type = "e2-standard-4"
service_account = google_service_account.gke-service-account.email
oauth_scopes = [
"cloud-platform"
]
shielded_instance_config {
enable_secure_boot = true
}
}
}
module "my-workload-identity" {
source = "terraform-google-modules/kubernetes-engine/google//modules/workload-identity"
name = "my-identity"
namespace = "default"
project_id = var.GCP_PROJECT_ID
roles = [
"roles/logging.logWriter",
"roles/cloudsql.client",
"roles/artifactregistry.reader"
]
}
data "google_client_config" "current" {}
provider "kubernetes" {
host = "https://${google_container_cluster.my-cluster.endpoint}"
token = data.google_client_config.current.access_token
cluster_ca_certificate = base64decode(google_container_cluster.my-cluster.master_auth.0.cluster_ca_certificate)
}
Question
Apologies if it is a double posting. I am trying to configure a worload identity federation on a private GKE cluster using the code snippet above, which follows the documentation and the guidelines in https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/using_gke_with_terraform
The resources are deployed by a pipeline in a GitLab k8s runner hosted in GCP, but on a different project.
image:
name: hashicorp/terraform:1.8.5
entrypoint:
- "/usr/bin/env"
- "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
before_script:
- pwd
- mkdir .gcp
- echo $GCP_SERVICE_ACCOUNT > .gcp/credentials.json
- export GOOGLE_APPLICATION_CREDENTIALS=".gcp/credentials.json"
- rm -rf .terraform
- terraform --version
- terraform init
# ...
apply:
stage: apply
script:
- export TF_LOG=DEBUG
- terraform apply -input=false -auto-approve "planfile"
dependencies:
- plan
only:
- main
needs:
- plan
when: manual
after_script:
- rm .gcp/credentials.json
The GKE cluster was created smoothly. Unfortunately, if I add the workload identity definition, the apply fails with this error:
module.my-workload-identity.kubernetes_service_account.main[0]: Still creating... [10s elapsed]
module.my-workload-identity.kubernetes_service_account.main[0]: Still creating... [20s elapsed]
module.my-workload-identity.kubernetes_service_account.main[0]: Still creating... [30s elapsed]
2024-06-22T12:27:37.333Z [ERROR] provider.terraform-provider-kubernetes_v2.31.0_x5: Response contains error diagnostic: @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 tf_proto_version=5.6 tf_provider_addr=registry.terraform.io/hashicorp/kubernetes tf_rpc=ApplyResourceChange @module=sdk.proto diagnostic_detail="" diagnostic_severity=ERROR diagnostic_summary="Post \"https://172.16.0.34/api/v1/namespaces/default/serviceaccounts\": context deadline exceeded" tf_req_id=9312e024-3cff-3a97-8799-9a54659b9c57 tf_resource_type=kubernetes_service_account timestamp=2024-06-22T12:27:37.333Z
2024-06-22T12:27:37.335Z [DEBUG] states/remote: state read serial is: 94; serial is: 94
2024-06-22T12:27:37.335Z [DEBUG] states/remote: state read lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7; lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7
2024-06-22T12:27:37.583Z [ERROR] vertex "module.my-workload-identity.kubernetes_service_account.main[0]" error: Post "https://172.16.0.34/api/v1/namespaces/default/serviceaccounts": context deadline exceeded
2024-06-22T12:27:37.584Z [DEBUG] states/remote: state read serial is: 95; serial is: 95
2024-06-22T12:27:37.584Z [DEBUG] states/remote: state read lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7; lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7
╷
│ Error: Post "https://172.16.0.34/api/v1/namespaces/default/serviceaccounts": context deadline exceeded
│
│ with module.my-workload-identity.kubernetes_service_account.main[0],
│ on .terraform/modules/my-workload-identity/modules/workload-identity/main.tf line 51, in resource "kubernetes_service_account" "main":
│ 51: resource "kubernetes_service_account" "main" {
│
╵
2024-06-22T12:27:37.787Z [DEBUG] provider.terraform-provider-google_v5.34.0_x5: 2024/06/22 12:27:37 [DEBUG] [transport] [server-transport 0xc0003fdc80] Closing: Server.Stop called
2024-06-22T12:27:37.788Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-22T12:27:37.794Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
The cluster endpoint looks correct.
In the k8s API server logs, I cannot see any request coming from the terraform process.
Can you please help me understanding the issue, or redirect me to some other info channel? I am stuck on it since a few days.
Thanks in advance.
Hi @diguida, thanks for opening this issue. Could you try to apply this separately please?
Hi @sheneska, thanks for looking into this. It is not clear to me what you are asking me with
try to
applythis separately.
Should I run the apply command in a Compute Engine instance or on my laptop instead of the runner?
Thanks.
@diguida Just ran across the exact same issue i was able to get it to work by adding 0.0.0.0/0 to master authorized networks as a test, wouldn't recommend doing this. You can check the k8s api server log and see what IP is being used in the request. I'm trying to get the cidr block from Hashi since we are using Terraform cloud
Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!