terraform-provider-google GKE AutoPilot Failure For Node Count

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.1.0

Affected Resource(s)

google_container_cluster

Terraform Configuration Files

provider "google" {
  project = var.project_id
  region  = var.region
}

resource "google_container_cluster" "primary" {
  name             = "${var.project_id}-gke"
  location         = var.region
  enable_autopilot = true
}

Debug Output

Panic Output

Expected Behavior

GKE AutoPilot cluster should spin up correctly

Actual Behavior

Terraform throws the following error:

│ Error: googleapi: Error 400: Max pods constraint on node pools for Autopilot clusters should be 32., badRequest
│ 
│   with module.gke-cluster.google_container_cluster.primary,
│   on gke-cluster/main.tf line 10, in resource "google_container_cluster" "primary":
│   10: resource "google_container_cluster" "primary" {

Steps to Reproduce

terraform apply

Important Factoids

Provider version 4.3.0 works as expected, but I couldn't see anything obvious when glancing at the diff. Seems likely to be related to max_pods_constraint, but all that looks to the untrained eye like Azure or AWS stuff, somehow.

#0000

Dec 21 '21 19:12 kylekurz

Could you provide some debug logs? It's not exactly clear from my perspective what the issue is.

You can get these by setting the environment variable TF_LOG to debug. In particular it would be useful to see what we are using to call the api.

Is the configuration you provided complete?

Dec 21 '21 23:12 ScottSuarez

Logs are here: https://gist.github.com/kylekurz/45d872721ed58e2b6d4ff70f76b26e0c

The configuration provided above is all that is needed to trigger this, if you're on provider version 4.5.0. If I back the provider down to 4.3.0, it works as expected.

Dec 22 '21 14:12 kylekurz

I just tested 4.4.0 too, that has the same error case. So something in the upgrade from 4.3.0 -> 4.4.0 breaks this, it's not new in 4.5.0.

Dec 22 '21 14:12 kylekurz

Having the same issue - downgrading to 4.3.0 works as a work around

Dec 23 '21 11:12 sashokbg

We are aware of the issue and there is a related pull request in the works

https://github.com/GoogleCloudPlatform/magic-modules/pull/5540

Dec 28 '21 18:12 ScottSuarez

Is there any workaround in the meantime while the PR is merged & released?

Jan 27 '22 20:01 Kukunin

Is there any workaround in the meantime while the PR is merged & released?

Workaround is just to set the version of the provider, so as not to use the latest ones.

    gcp = {
      source  = "hashicorp/google"
      version = "4.3.0"
    }

Jan 28 '22 09:01 grrywlsn

An alternative workaround is to set the ip_allocation_policy. Could even be empty like so:

resource "google_container_cluster" "primary" {
  name             = "${var.project_id}-gke"
  location         = var.region
  ip_allocation_policy {
  }
  enable_autopilot = true
}

Jan 28 '22 18:01 c2thorn

An alternative workaround is to set the ip_allocation_policy. Could even be empty like so:
resource "google_container_cluster" "primary" {
  name             = "${var.project_id}-gke"
  location         = var.region
  ip_allocation_policy {
  }
  enable_autopilot = true
}

Works with pulumi too.

  ipAllocationPolicy: {},

Thanks!

Mar 28 '22 05:03 mzavaletavargas

+1

Jun 01 '22 11:06 loeffel-io

Even the minor version increase errors with the posted message...

I just tried to build an autopilot cluster with providers v4.36.0.

terraform {
  required_version = "~> 1.2.9"
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 4.36.0"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 4.36.0"
    }
  }
}

Dialed it back to v4.3.0 and it works.

Sep 19 '22 02:09 todd-dsm

Ran into this problem too as soon as I started testing with auto-pilot. Surprised, it hasn't been fixed for so long.

Oct 24 '22 12:10 meteatamel

Is this still not fixed?

Jan 19 '23 19:01 arueth

Adding this inside the google_container_cluster resource fixed it for our team

resource "google_container_cluster" "foo" {
  ...
  
  ip_allocation_policy {
    cluster_secondary_range_name  = "pod-range"
    services_secondary_range_name = "service-range"
  }
}

Jan 25 '23 21:01 NFollett89

Even simpler workaround to set networks to defaults:

  ip_allocation_policy {
    cluster_ipv4_cidr_block  = ""
    services_ipv4_cidr_block = ""
  }

Jan 31 '23 15:01 jonaseck2

I can confirm that on 4.56.0, you still need to use a workaround, currently using an empty ip_allocation _policy block as suggested above.

Mar 13 '23 13:03 kylekurz

I can confirm that on 4.56.0, you still need to use a workaround, currently using an empty ip_allocation _policy block as suggested above.

Also as of 4.59.0 the issue persists and is fixed by empty ip_allocation_policy

Mar 31 '23 14:03 siikanen

v4.60.x is still an issue. I am facing it and reporting it.

Apr 08 '23 04:04 muthukumars

to be more clear 4.60.2

Apr 08 '23 05:04 muthukumars

Even simpler workaround to set networks to defaults:
  ip_allocation_policy {
    cluster_ipv4_cidr_block  = ""
    services_ipv4_cidr_block = ""
  }

Ran into this issue earlier, this workaround worked for me. Strange, but congratulations on finding that workaround.

Apr 19 '23 13:04 AeroNotix

ver 4.63....... still need workaround why is that long??

May 02 '23 20:05 greenozon

Hey folks, a fix has just been committed for this issue. Thanks for your patience!!

The change will be included the 4.72.0 provider release pending no revert or speedbumps.

Jun 26 '23 19:06 ScottSuarez

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Jul 27 '23 02:07 github-actions[bot]

terraform-provider-google terraform-provider-google copied to clipboard

GKE AutoPilot Failure For Node Count

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

terraform-provider-google
terraform-provider-google copied to clipboard