terraform-provider-google icon indicating copy to clipboard operation
terraform-provider-google copied to clipboard

google_cloud_run_v2_service: Apply fails always the first time, second time succeeds

Open MaxDaten opened this issue 1 year ago • 1 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

$ Terraform v1.6.1-dev
on darwin_arm64
+ provider registry.terraform.io/hashicorp/google v5.3.0

Affected Resource(s)

  • google_cloud_run_v2_service

Terraform Configuration Files

resource "google_cloud_run_v2_service" "default" {
  name     = var.image-name
  location = var.location
  ingress  = "INGRESS_TRAFFIC_ALL"

  template {
    timeout                          = "300s"
    max_instance_request_concurrency = 80
    execution_environment            = "EXECUTION_ENVIRONMENT_GEN1" # Faster cold starts worse network performance
    service_account                  = google_service_account.default.email

    containers {
      image = "${var.artifact-registry}/${var.image-name}:${var.image-tag}"
      ports {
        name           = "h2c"
        container_port = 8080
      }

      resources {
        limits = {
          cpu    = "1000m"
          memory = "128Mi"
        }
        cpu_idle          = true
        startup_cpu_boost = true
      }
    }

    scaling {
      min_instance_count = 0
      max_instance_count = 2
    }
  }

  timeouts {
    create = "10m"
    update = "10m"
  }
}

data "google_service_account" "github-actions-sa" {
  account_id = "github-actions"
}

# Service Account for this Service

resource "google_service_account" "default" {
  account_id   = "${var.image-name}-sa"
  display_name = "Service Account for ${var.image-name}"
}


# actAs for github-actions-sa

resource "google_service_account_iam_member" "github-actions-sa-actAs" {
  service_account_id = google_service_account.default.name
  role               = "roles/iam.serviceAccountTokenCreator"
  member             = data.google_service_account.github-actions-sa.member
}

Var-File:

{"artifact-registry":"europe-west3-docker.pkg.dev/xxxxx/docker","image-name":"service","image-tag":"ynyyljjanwagh41jmlfrih0bgjlk5jgn","location":"europe-west3","project_id":"xxxxx"}

Debug Output

https://gist.github.com/MaxDaten/fb51af1741953acfed1505fbb63d4a4b

Expected Behavior

Apply should succeed

Actual Behavior

Fails on first apply, succeeds on second run:

google_cloud_run_v2_service.default: Modifying... [id=projects/xxxxx/locations/europe-west3/services/service]
╷
│ Error: Error waiting for Updating Service: error while retrieving operation: googleapi: Error 403: Permission 'run.operations.get' denied on resource 'projects/xxxxx/locations/europe-west3/operations/14cecd3f-34e3-4718-bb52-5a8c844acd85' (or resource may not exist).
│ 
│   with google_cloud_run_v2_service.default,
│   on main.tf line 2, in resource "google_cloud_run_v2_service" "default":
│    2: resource "google_cloud_run_v2_service" "default" {
│ 
╵
Error: Process completed with exit code 1.

Steps to Reproduce

  1. terraform apply -var-file=vars.json (fails)
  2. Second: terraform apply -var-file=vars.json successful

Important Factoids

using a service account configured like this:


resource "google_cloud_run_v2_service_iam_member" "github-actions-sa" {
  project  = local.project_id
  location = local.location
  name     = "${var.service}"
  role     = "roles/run.admin"
  member   = google_service_account.github-actions-sa.member
}

b/307707059

MaxDaten avatar Oct 24 '23 19:10 MaxDaten

While IAM eventual consistency will make this continue to fail 80+% of the time, one trick we use is to make the service depends_on the actAs IAM grant.

e.g.

resource "google_cloud_run_v2_service" "default" {
  depends_on = [google_service_account_iam_member. github-actions-sa-actAs]
  ...
}

Technically without this the actAs grant could be granted AFTER the attempt to create the service, where it is really a pre-requisite.

mattmoor avatar Feb 10 '24 15:02 mattmoor

Did you find any way to avoid this permission failure?

alvadorn avatar Apr 17 '24 18:04 alvadorn

As @mattmoor already pointed out, the actAs is a pre-requisite in this case. The dependency needs to be defined.

@melinath I think we can close this.

yanweiguo avatar Jun 03 '24 22:06 yanweiguo

Accounting for IAM propagation delays may also require dependencies on time_sleep resources

melinath avatar Jun 03 '24 22:06 melinath

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Jul 07 '24 02:07 github-actions[bot]