terraform-google-kubernetes-engine icon indicating copy to clipboard operation
terraform-google-kubernetes-engine copied to clipboard

Invalid count argument

Open tvvignesh opened this issue 4 years ago • 19 comments

Hi. I tried setting up gke private cluster (safer-cluster-update-variant) and whenever I make any errors (accidentaly giving the wrong image name or machine type and so on), the apply fails (not detected in plan) which is understandable.

But, if I try fixing the issue, and run plan and apply again, I get this:

Capture

It has been discussed here: https://github.com/hashicorp/terraform/issues/21450 https://github.com/hashicorp/terraform/issues/12570

but I am not able to understand how to get over this.

I do understand that it is happening because it is not able to find any node pool in the cluster for which it can determine the count. If I go to .terraform/modules/global_gke.gke.gcloud_wait_for_cluster/main.tf I can see this line which is where the issue is.

resource "null_resource" "module_depends_on" {
  count = length(var.module_depends_on) > 0 ? 1 : 0

  triggers = {
    value = length(var.module_depends_on)
  }
}

Currently what I am doing is deleting the cluster every time and re-creating it from scratch. May I know how I can avoid doing that and just fix this issue? Thanks.

tvvignesh avatar Sep 26 '20 13:09 tvvignesh

Hi @tvvignesh Could you let me know which version of TF and GKE module and if not version = "~> 11.1.0" could you try that?

bharathkkb avatar Sep 26 '20 20:09 bharathkkb

@bharathkkb Hi. Running in TF version v0.13.2 and GKE 1.18.6-gke.4801 an latest version of this module.

tvvignesh avatar Sep 27 '20 10:09 tvvignesh

@tvvignesh could you provide your config, I can try to reproduce

bharathkkb avatar Sep 27 '20 23:09 bharathkkb

@bharathkkb Sure. This would be the relevant portion of the config. Kindly replace the vars where necessary.

module "global_gke" {
  source = "../modules/safer-cluster-update-variant"

  description                     = "My Cluster"
  project_id                      = module.global_enabled_google_apis.project_id
  name                            = var.global_cluster_name
  region                          = var.global_region
  network                         = module.global_vpc.network_name
  subnetwork                      = module.global_vpc.subnets_names[0]
  horizontal_pod_autoscaling      = true
  enable_vertical_pod_autoscaling = true
  enable_pod_security_policy      = true
  http_load_balancing             = true
  gce_pd_csi_driver               = true
  monitoring_service              = "none"
  logging_service                 = "none"
  release_channel                 = "RAPID"
  enable_shielded_nodes           = true
  ip_range_pods                   = module.global_vpc.subnets_secondary_ranges[0].*.range_name[0]
  ip_range_services               = module.global_vpc.subnets_secondary_ranges[0].*.range_name[1]
  master_authorized_networks = [{
    cidr_block   = "${module.global_bastion.ip_address}/32"
    display_name = "Global Bastion Host"
  }]
  grant_registry_access = true
  node_pools = [
    {
      name            = "global-pool-1"
      machine_type    = "n1-standard-4"
      min_count       = 1
      max_count       = 20
      local_ssd_count = 0
      disk_size_gb    = 30
      disk_type       = "pd-ssd"
      image_type      = "UBUNTU_CONTAINERD"
      auto_repair     = true
      auto_upgrade    = true
      node_metadata   = "GKE_METADATA_SERVER"
      service_account = "${var.global_sa}"
      preemptible     = false
    }
  ]
}

tvvignesh avatar Sep 28 '20 00:09 tvvignesh

Having the exact same issue as well. Seems to only happen when you've made an error, and once it gets in this state you can't terraform destroy to start again either.

halkyon avatar Sep 29 '20 07:09 halkyon

@halkyon What was the error you made? Reproducing this will likely require us to see your broken config.

morgante avatar Sep 29 '20 12:09 morgante

@morgante Here you go: https://github.com/halkyon/gke-beta-private-cluster-example

Using Terraform v0.13.4.

Change the values in terraform.tfvars to your liking, and do a terraform init && terraform apply to provision a new cluster. Now change the machine_type value in the node_pools variable in terraform.tfvars to something invalid, then terraform apply again, and you'll get an error as expected. Now fix that back up to e2-medium or another valid type, and terraform apply again. This error is shown:

Error: Invalid count argument

  on .terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/main.tf line 63, in resource "null_resource" "module_depends_on":
  63:   count = length(var.module_depends_on) > 0 ? 1 : 0

The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends on.

Hope this helps!

halkyon avatar Oct 01 '20 09:10 halkyon

Exact same issue here.

mspinassi-medallia avatar Oct 06 '20 21:10 mspinassi-medallia

I was able to reproduce this with 0.13.4; seems like after the node pool config errors out, TF is unable to resolve [for pool in google_container_node_pool.pools : pool.name] at plan time. I'll do some more digging for a fix and see if its just for 0.13.4 or all 0.13.x.

Works as intended with 0.12.29.

bharathkkb avatar Oct 07 '20 03:10 bharathkkb

Any updates , this happens to me to with 13.4 and after upgrading the node pool

innovia avatar Oct 08 '20 23:10 innovia

what's up with this? if the module fail and its easy to replicate if you put invalid machine type say for example e2-medium-2 it fails on this error as if its in a bad state.

can you please fix this?

innovia avatar Oct 14 '20 01:10 innovia

Since this is working in Terraform 0.12.x but not in 0.13.x I'm inclined to believe this is a Terraform Core issue. We can attempt to workaround it but it's not a high priority when Core should be fixing it.

morgante avatar Oct 14 '20 02:10 morgante

I was able to create a light repro which works with 0.12.x and not with 0.13.4. I will open an issue in core. A workaround seems to be to use terraform apply -refresh=false which bypasses the initial refresh that throws this error.

bharathkkb avatar Oct 14 '20 03:10 bharathkkb

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jan 05 '21 23:01 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Mar 08 '21 23:03 github-actions[bot]

I'm getting this issue with terraform:0.14.7, during tf plan phase:

Error: Invalid count argument

  on .terraform/modules/config_sync.configsync_operator.k8sop_manifest/main.tf line 57, in resource "random_id" "cache":
  57:   count = (! local.skip_download) ? 1 : 0

The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends on.

Any suggestions on the workaround?

AlexBulankou avatar Mar 16 '21 21:03 AlexBulankou

@AlexBulankou Is this for a fresh deploy? What does your module configuration look like?

morgante avatar Mar 16 '21 22:03 morgante

Yes, this is a fresh deploy: module confg.

AlexBulankou avatar Mar 24 '21 21:03 AlexBulankou

To follow-up, the workaround for me was to back to terraform:0.12.29.

AlexBulankou avatar Apr 02 '21 23:04 AlexBulankou