terraform-provider-google
terraform-provider-google copied to clipboard
GKE autopilot is always created with default service account II
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
- Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- If an issue is assigned to the
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.
This is duplicate of https://github.com/hashicorp/terraform-provider-google/issues/8918 see https://github.com/hashicorp/terraform-provider-google/issues/8918#issuecomment-869990917 - sorry for creating this, but I don't seem to have rights to re-open the original issue (?) and it doesn't seem to be any activity there.
@slevenick is there any update on this?
I'm not sure how to proceed with this. This bug is due to a weird interaction between autopilot & the default service account field.
Basically, the API is not respecting the request that is sent with the service account. I'm not sure how gcloud is setting up the autopilot cluster with a non-default service account successfully. Can you capture the HTTP requests to see if that is happening in a single request, or if there is a later update to apply the service account?
Hi, i run into the same problem.
@slevenick Is there any update on this subject ?
Best regards.
Sorry for late answer @slevenick - I was on vacation...
I executed this:
gcloud container --project "hmplayground" clusters create-auto "my-cluster" --region "europe-west3" --release-channel "regular" --network "projects/hmplayground/global/networks/my-vpc" --subnetwork "projects/hmplayground/regions/europe-west3/subnetworks/my-subnet" --cluster-secondary-range-name="my-pods" --services-secondary-range-name="my-services" --enable-master-authorized-networks --enable-private-nodes --master-ipv4-cidr="172.16.0.16/28" --service-account="[email protected]" --scopes="logging-write,monitoring,storage-ro" --log-http
This is the request:
==== request start ====
uri: https://container.googleapis.com/v1/projects/hmplayground/locations/europe-west3/clusters?alt=json
method: POST
== headers start ==
b'X-Goog-User-Project': b'hmplayground'
b'accept': b'application/json'
b'accept-encoding': b'gzip, deflate'
b'authorization': --- Token Redacted ---
b'content-length': b'926'
b'content-type': b'application/json'
b'user-agent': b'google-cloud-sdk gcloud/344.0.0 command/gcloud.container.clusters.create-auto invocation-id/9db76483e82c490f9d34ad2fdffeda72 environment/None environment-version/None interactive/True from-script/False python/3.9.7 term/xterm-256color (Linux 5.13.13)'
== headers end ==
== body start ==
{"cluster": {"autopilot": {"enabled": true}, "ipAllocationPolicy": {"clusterSecondaryRangeName": "my-pods", "createSubnetwork": false, "servicesSecondaryRangeName": "my-services", "useIpAliases": true}, "masterAuthorizedNetworksConfig": {"enabled": true}, "name": "my-cluster", "network": "projects/hmplayground/global/networks/my-vpc", "nodePools": [{"config": {"oauthScopes": ["https://www.googleapis.com/auth/devstorage.read_only", "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring"], "serviceAccount": "[email protected]"}, "initialNodeCount": 1, "name": "default-pool"}], "privateClusterConfig": {"enablePrivateNodes": true, "masterIpv4CidrBlock": "172.16.0.16/28"}, "releaseChannel": {"channel": "REGULAR"}, "subnetwork": "projects/hmplayground/regions/europe-west3/subnetworks/my-subnet"}, "parent": "projects/hmplayground/locations/europe-west3"}
== body end ==
==== request end ====
---- response start ----
status: 200
-- headers start --
-content-encoding: gzip
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
cache-control: private
content-length: 446
content-type: application/json; charset=UTF-8
date: Tue, 14 Sep 2021 14:03:39 GMT
server: ESF
transfer-encoding: chunked
vary: Origin, X-Origin, Referer
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 0
-- headers end --
-- body start --
{
"name": "operation-1631628219731-15754d1b",
"zone": "europe-west3",
"operationType": "CREATE_CLUSTER",
"status": "RUNNING",
"selfLink": "https://container.googleapis.com/v1/projects/306799302406/locations/europe-west3/operations/operation-1631628219731-15754d1b",
"targetLink": "https://container.googleapis.com/v1/projects/306799302406/locations/europe-west3/clusters/my-cluster",
"startTime": "2021-09-14T14:03:39.731893675Z"
}
-- body end --
total round trip time (request+response): 4.417 secs
---- response end ----
hi, I ran into the same issue, not being able to assign a custom service account to an autopilot gke cluster with terraform v1.0.1.
@slevenick Is there any update on this subject?
Regards, C.
Hi, Any update about this bug? I need to create an autopilot cluster with a custom service account. With a gcloud command it's working. I understand that the API used by terraform is different from the one used by gcloud, is it right? With the last terraform version I still have this issue. Regards
@nilsoulinou I created GKE cluster via gcloud
CLI and terraform import
ed into configuration. This works.
@venkykuberan @slevenick is this still considered active?
@tSte are you saying that you can't create GKE in autopilot mode with a non default service account directly with google provider and you have to create it with gcloud command then import it with terraform ?
If yes, i think this issue is still active because i expect it to be performed with terraform and not having to do manual steps.
@lrk you're right - all of our clusters are currently created via gcloud
CLI and ten imported and managed via TF.
Are there any updates to this thread, on the ability to use non default SA to provision a Autopilot GKE?
The issue occurs because Terraform is using a deprecated field to set up the service account while the API no longer respects this field when the cluster type is Autopilot.
The following payload to the API will create the cluster succesfully:
{
"cluster": {
"autopilot": {
"enabled": true
},
"binaryAuthorization": {
"enabled": false
},
"ipAllocationPolicy": {
"clusterSecondaryRangeName": "cluster-1",
"servicesSecondaryRangeName": "service-1",
"useIpAliases": true
},
"legacyAbac": {
"enabled": false
},
"maintenancePolicy": {
"window": {}
},
"masterAuthorizedNetworksConfig": {
"cidrBlocks": [
{
"cidrBlock": "172.16.0.0/16"
}
],
"enabled": true
},
"name": "gke-cluster",
"network": "projects/network-host-0372/global/networks/production",
"networkConfig": {
"datapathProvider": "ADVANCED_DATAPATH",
"enableIntraNodeVisibility": true
},
"nodePools":[
{
"config":{
"oauthScopes":[
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring"
],
"serviceAccount":"[email protected]"
},
"initialNodeCount":1,
"name":"default-pool"
}
],
"privateClusterConfig": {
"enablePrivateEndpoint": true,
"enablePrivateNodes": true,
"masterGlobalAccessConfig": {
"enabled": true
},
"masterIpv4CidrBlock": "10.128.65.0/28"
},
"shieldedNodes": {
"enabled": true
},
"subnetwork": "projects/network-host-0372/regions/europe-west3/subnetworks/node-1"
}
}
However, Terraform generates the following payload:
{
"cluster": {
"autopilot": {
"enabled": true
},
"binaryAuthorization": {
"enabled": false
},
"ipAllocationPolicy": {
"clusterSecondaryRangeName": "cluster-1",
"servicesSecondaryRangeName": "service-1",
"useIpAliases": true
},
"legacyAbac": {
"enabled": false
},
"maintenancePolicy": {
"window": {}
},
"masterAuthorizedNetworksConfig": {
"cidrBlocks": [
{
"cidrBlock": "172.16.0.0/16"
}
],
"enabled": true
},
"name": "gke-cluster",
"network": "projects/network-host-0372/global/networks/production",
"networkConfig": {
"datapathProvider": "ADVANCED_DATAPATH",
"enableIntraNodeVisibility": true
},
"nodeConfig": {
"oauthScopes": [
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write"
],
"serviceAccount": "[email protected]"
},
"privateClusterConfig": {
"enablePrivateEndpoint": true,
"enablePrivateNodes": true,
"masterGlobalAccessConfig": {
"enabled": true
},
"masterIpv4CidrBlock": "10.128.65.0/28"
},
"shieldedNodes": {
"enabled": true
},
"subnetwork": "projects/network-host-0372/regions/europe-west3/subnetworks/node-1"
}
}
The difference between these two is, the former is using the nodeConfig
property, which is already deprecated, and the latter is using nodePools.config
. Apparently Autopilot does not recognise the deprecated property, although this is is not documented.
Perhaps Terraform provider should get away from the deprecated property to avoid not only this one but also other any future issues @slevenick. There is already TODO item here for that :)
Thinking about this a little bit more, I believe the API should not simply ignore the field although it is deprecated. I have also created an issue https://issuetracker.google.com/issues/219237911. Impacted people may consider starring the issue.
@slevenick: Updating assignment because I think this has gone inactive, please correct this if you're still working on it!
Perhaps Terraform provider should get away from the deprecated property to avoid not only this one but also other any future issues @slevenick. here is already TODO item here for that :)
The TODO in that file was for another tool that the MM generator used to be used for- Terraform's implementation is handwritten. https://github.com/hashicorp/terraform-provider-google/issues/7185 and https://github.com/hashicorp/terraform-provider-google/issues/4963 (roughly) track potential removal of the field. We haven't gone forward with it because of the projected impact- requiring users to rewrite configs, and recreating their clusters if they get it wrong- and the lack of signal from the API that they'll actually remove the field.
The API respecting the service account in one case and not the other is confusing and frustrating as both those messages should have created the same cluster- thanks for filing upstream. I think there's a workaround in the provider today, luckily, as you should be able to create clusters with node_pools
set. We're passing the message directly on to the API, and the transformation to config highlighted in https://github.com/hashicorp/terraform-provider-google/issues/4963#issuecomment-557268286 should be possible to produce a working payload.
Hi all, the underlying API issue seems to be resolved according to here:
https://issuetracker.google.com/issues/219237911#comment3
If someone can confirm that on Terraform side this also fixed the issue, then this one can be closed.
Hi all, I'm new in this community. It seems that the bug has been fixed. Could you tell me the terraform release version or google provider version to use, in order to perform the test with a custom SA for GKE autopilot?
Regards
Nils
Hi, I still have the default service account attached to the GKE Cluster with these versions:
Terraform v1.1.7 on linux_amd64
- provider registry.terraform.io/hashicorp/google v4.15.0
- provider registry.terraform.io/hashicorp/google-beta v4.15.0
with the following terraform block:
resource "google_container_cluster" "private" {
name = "XXXXX"
location = var.region
network = google_compute_network.xxxx.id
subnetwork = google_compute_subnetwork.xxxx.id
node_config {
service_account = google_service_account.yyy.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
}
private_cluster_config {
enable_private_endpoint = true
enable_private_nodes = true
master_ipv4_cidr_block = "XXX.XXX.XXX.XXX/28"
}
master_authorized_networks_config {
cidr_blocks {
cidr_block = "XXX.XXX.XXX.XXX/24"
display_name = "xxxx"
}
cidr_blocks {
cidr_block = "XXX.XXX.XXX.XXX/16"
display_name = "xxxx"
}
}
# Enable Autopilot for this cluster
enable_autopilot = true
vertical_pod_autoscaling {
enabled = true
}
# Configuration of cluster IP allocation for VPC-native clusters
ip_allocation_policy {
cluster_ipv4_cidr_block = "XXX.XXX.XXX.XXX/16"
services_ipv4_cidr_block = "XXX.XXX.XXX.XXX/24"
}
# Configuration options for the Release channel feature, which provide more control over automatic upgrades of your GKE clusters.
release_channel {
channel = "REGULAR"
}
}
Should I need additional informations?
Nils
If you feel the issue was not fixed, please drop a comment to https://issuetracker.google.com/issues/219237911#comment3
I've recently run into this issue myself. Below are my findings
Terraform v1.1.5 on darwin_amd64
- provider registry.terraform.io/hashicorp/external v2.2.2
- provider registry.terraform.io/hashicorp/google v4.22.0
- provider registry.terraform.io/hashicorp/google-beta v4.22.0
- provider registry.terraform.io/hashicorp/kubernetes v2.11.0
- provider registry.terraform.io/hashicorp/null v3.1.1
- provider registry.terraform.io/hashicorp/random v3.2.0
Like in https://github.com/hashicorp/terraform-provider-google/issues/9505#issuecomment-1039029610 I noticed the payload that was being generated for a new autopilot cluster was the following:
POST /v1beta1/projects/{project_id}/locations/us-west1/clusters?alt=json&prettyPrint=false HTTP/1.1
Host: container.googleapis.com
...
{
"cluster": {
"addonsConfig": {
"horizontalPodAutoscaling": {
"disabled": false
},
"httpLoadBalancing": {
"disabled": false
}
},
"autopilot": {
"enabled": true
},
"binaryAuthorization": {
"enabled": false
},
"ipAllocationPolicy": {
"clusterSecondaryRangeName": "network-pods",
"servicesSecondaryRangeName": "network-services",
"useIpAliases": true
},
"legacyAbac": {
"enabled": false
},
"locations": [
"us-west1-a",
"us-west1-b",
"us-west1-c"
],
"loggingService": "logging.googleapis.com/kubernetes",
"maintenancePolicy": {
"window": {
"dailyMaintenanceWindow": {
"startTime": "05:00"
}
}
},
"masterAuth": {
"clientCertificateConfig": {}
},
"masterAuthorizedNetworksConfig": {},
"monitoringService": "monitoring.googleapis.com/kubernetes",
"name": "us-west1-dev-autopilot-test",
"network": "projects/{project_id}/global/networks/anthos-network",
"networkConfig": {
"defaultSnatStatus": {
"disabled": false
},
"enableIntraNodeVisibility": true
},
"nodeConfig": {
"oauthScopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append"
]
},
"notificationConfig": {
"pubsub": {}
},
"releaseChannel": {
"channel": "REGULAR"
},
"shieldedNodes": {
"enabled": true
},
"subnetwork": "projects/{project_id}/regions/us-west1/subnetworks/anthos-subnet",
"verticalPodAutoscaling": {
"enabled": true
}
}
}
Looking at the documentation to create a cluster at [1] it lists the command to be used as
gcloud container clusters create-auto CLUSTER_NAME \
--region REGION \
--project=PROJECT_ID
So that means that the Terraform provider using the default cluster creation API [2] that doesn't list any flags to specify autopilot when it should be using [3] instead. I've verified that using the following command will create an Autopilot cluster with a correct service account.
gcloud container --project {project_id} clusters create-auto autopilot-test \
--region=us-west1 \
--release-channel=regular \
--service-account=cluster-admin@{project_id}.iam.gserviceaccount.com \
--network=test-network \
--subnetwork=test-subnet \
--cluster-secondary-range-name=network-pods \
--services-secondary-range-name=network-services
While I see that there is discussion of a deprecation at [4] it seems like a quicker solution may to use the API specified in [3] which currently works.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/creating-an-autopilot-cluster#gcloud [2] https://cloud.google.com/sdk/gcloud/reference/container/clusters/create [3] https://cloud.google.com/sdk/gcloud/reference/container/clusters/create-auto [4] https://issuetracker.google.com/issues/219237911?pli=1
Any update on this?
@rileykarson - happy to take this on ... I think I have the answer
Why don't we revert https://github.com/GoogleCloudPlatform/magic-modules/pull/4894 but do the same workaround for node_pool
instead. I have already tested this approach, it works. I know it is not optimal but at least make things work. PR is in progress. Based on the excellent comment https://github.com/hashicorp/terraform-provider-google/issues/9505#issuecomment-1039029610
node_pool {
name = "default-pool"
initial_node_count = 1
node_config {
service_account = "[email protected]"
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
}
}
After struggling with this some time today, I believe I've found the key. The request must define one node pool with name "default-pool". Using any other name results in getting a different "default-pool" configured with autopilot defaults.
Other findings:
- setting
autoscaling.autoprovisioning_node_pool_defaults.service_account
is allowed by the API but seems to do nothing - The above method (define a single nodepool named "default-pool") is sufficient to affect the resulting cluster's settings for all three of
autoscaling.autoprovisioning_node_pool_defaults.service_account
, the deprecatednode_config.service_account
, and (unsurprisingly)node_pools[0].config.service_account
. - GKE API rejects setting
autoscaling_profile
for autopilot clusters only if it's not set toBALANCED
.
This took me forever to find because I'm using the google-cloud-provided gke module that uses "default-node-pool" as the name of the singular default nodepool.
I was able to terraform an autopiloted gke cluster by removing the conflict between enable_autopilot
and node_pools
. Empircally, the resource_limits
and autoscaling_profile
subsettings of node_pools does in fact conflict with enable_autopilot, so I pushed the conflict down to those.
This is the resulting patch: https://github.com/bukzor-sentryio/terraform-provider-google-beta/commit/pr-9505-autopilot-with-nodepools
While that "worked" the resulting diff behavior is entirely borked. I have currently two clusters terraformed, and terraform-plan wants to tear down the one that has the correct service account (because node_pools[0].metadata changed) and it believes the other cluster with the wrong service account needs no changes.
Unfortunately the Google issue was closed with Won't fix. It seems the only way is to fix this on Terraform side.
Hey all, the correct solution here is to pass cluster_autoscaling.auto_provisioning_defaults.service_account
which is the Autopilot friendly way to pass service accounts.
You don't have control over node pools in Autopilot (and there may not even be one at first) so passing with node_pools
no longer makes sense. The API does partially support passing via default pool for legacy reasons as this group has discovered but it's not a great approach and won't work nicely with terraform.
I'll take the fix @rileykarson @mastersingh24
@JeremyOT - yeah - I've had something in the works, but was waiting to see how things played out. Here's a draft PR: https://github.com/GoogleCloudPlatform/magic-modules/pull/6732
@mastersingh24 Ok cool - I did something similar but didn't add conflicts on the CA subfields, and defaulted CA.enabled=true when autopilot is enabled and no value is supplied. Both work, I have no real preference. Trying to get it out there before kubecon kicks off
GoogleCloudPlatform/magic-modules#6733
Let's push yours through @JeremyOT .. looks good and I was going to add the defaults as well.
@JeremyOT, @mastersingh24 Fair enough that your approach looks better. If @JeremyOT PR works, setting cluster_autoscaling.auto_provisioning_defaults.service_account
makes much more sense then messing with node_pools
which I did in https://github.com/GoogleCloudPlatform/magic-modules/pull/6611. Not that I'm a contributor but I have added a few comments on https://github.com/GoogleCloudPlatform/magic-modules/pull/6733 though.
I don't think this is fixed. I've built the provider with #13024 and am trying to provision an Autopilot cluster. We'd previously deleted the default GCE SA from the project entirely, and get
Error: googleapi: Error 400: Service account "
[email protected]" does not exist., badRequest
even when specifying a custom SA.