jit-access icon indicating copy to clipboard operation
jit-access copied to clipboard

Requirement to specify`IAP_BACKEND_SERVICE_ID` creates a circular dependency

Open aebrahim opened this issue 1 year ago • 3 comments

I do not see a good way to create the cloud run job using an infrastructure as code solution like terraform due to the circular nature of the requirement to specify the IAP_BACKEND_SERVICE_ID in the cloud run service itself, which needs to be created before the backend service can be created.

This is sample failing terraform configuration:

resource "google_service_account" "jit_access" {
  account_id   = "jit-access"
  project      = var.project_id
  display_name = "Grants just-in-time access to IAM roles"
}

resource "google_project_iam_member" "jit_access_iam_admin" {
  project = var.project_id
  role    = "roles/resourcemanager.projectIamAdmin"
  member  = google_service_account.jit_access.member
}

resource "google_project_iam_member" "jit_access_cloudasset_viewer" {
  project = var.project_id
  role    = "roles/cloudasset.viewer"
  member  = google_service_account.jit_access.member
}

resource "google_project_iam_member" "jit_access_token_creator" {
  project = var.project_id
  member  = google_service_account.jit_access.member
  role    = "roles/iam.serviceAccountTokenCreator"
}

resource "google_cloud_run_v2_service" "jit_access" {
  name     = "jit-access"
  location = var.region
  project  = var.project_id
  ingress  = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
  template {
    containers {
      image = "${var.region}-docker.pkg.dev/${var.project}/containers/jit-access:latest"
      name  = "jit-access"
      env {
        name  = "RESOURCE_SCOPE"
        value = "projects/${var.project_id}"
      }
      env {
        name  = "RESOURCE_CUSTOMER_ID"
        value = var.customer_id
      }
      env {
        name  = "IAP_BACKEND_SERVICE_ID"
        value = google_compute_backend_service.jit_access.generated_id
      }
      resources {
        startup_cpu_boost = true
        cpu_idle          = true
      }
    }

    service_account       = google_service_account.jit_access.email
    execution_environment = "EXECUTION_ENVIRONMENT_GEN2"
  }
}


data "google_iam_policy" "jit_access_iap" {
  binding {
    role = "roles/iap.httpsResourceAccessor"
    members = ["group:[email protected]"]
  }
}

resource "google_iap_web_backend_service_iam_policy" "jit_access" {
  project             = var.project_id
  web_backend_service = google_compute_backend_service.jit_access.name
  policy_data         = data.google_iam_policy.jit_access_iap.policy_data
}

resource "google_compute_region_network_endpoint_group" "jit_access" {
  name                  = "jit-access"
  network_endpoint_type = "SERVERLESS"
  region                = var.region

  cloud_run {
    service = google_cloud_run_v2_service.jit_access.name
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_backend_service" "jit_access" {
  name = "jit-access"

  backend {
    group = google_compute_region_network_endpoint_group.jit_access.id
  }
  iap {
    oauth2_client_id     = google_iap_client.my_iap_client.client_id
    oauth2_client_secret = google_iap_client.my_iap_client.secret
  }

  custom_response_headers = local.security_headers
  log_config {
    enable = true
  }
}

Can the required environmental variable IAP_BACKEND_SERVICE_ID be removed from the cloud run job? From the docs, I am not clear on why the cloud run job needs to be aware of the IAP environment feeding traffic to it.

aebrahim avatar Mar 07 '24 22:03 aebrahim

JIT Access completely relies on IAP for authentication and authorization. To verify that (a) IAP is enabled at all and (b) that a request has indeed being vetted by IAP, JIT Access verifies the x-goog-iap-jwt-assertion header. For that, it needs to know what audience to expect in the header.

On AppEngine, the expected audience can be derived from the project ID. On Cloud Run, there's no way for the application to determine the audience automatically, hence the need for the IAP_BACKEND_SERVICE_ID variable.

To break the cyclic dependency between the load balancer and Cloud Run, you can deploy things in the following order:

  1. Load balancer backend
  2. Cloud Run
  3. Other load balancer components

That's the sequence followed by the manual setup instructions. Maybe you can get Terraform to follow this sequence by using depends_on?

jpassing avatar Mar 11 '24 22:03 jpassing

Thanks @jpassing for the response!

Unfortunately, to the best of my knowledge, depends_on doesn't resolve circular dependencies, it only pulls an unconnected dependency into the dependency graph for the current node.

Is there a way we can rely on the terraform configuration to ensure that all traffic flowing in to the jit-access service has already been authenticated and authorized by IAP, and use a new flag to skip re-validation in jit-access? If this is possible, IMHO, the requirement to break infrastructure as code and require a superuser to perform manual steps negates any security improvement from a re-validation of an already validated token.

aebrahim avatar Mar 12 '24 18:03 aebrahim

What should work is to...

  1. Manually create the backend:
gcloud compute backend-services create jitaccess-backend \
  --load-balancing-scheme=EXTERNAL \
  --global
  1. Let the TF module use the existing backend (either by using a data source or passing the backend ID as parameter)

I know it's not great, but (1) is a one-time thing, so any subsequent deployments could be fully automatic.

jpassing avatar Mar 14 '24 03:03 jpassing

I think this should be queryable at runtime in cloud run:

The network endpoint group can be found by grabbing self_link from

curl 'https://compute.googleapis.com/compute/v1/projects/{projectId}/regions/us-central1/networkEndpointGroups?filter=cloudRun.service={env.K_SERVICE}ANDnetworkEndpointType=serverless

docs for K_SERVICE

We can then use that to query for the backend service:

curl 'https://compute.googleapis.com/compute/v1/projects/{projectId}/global/backendServices?filter=iap.enabled=true AND kind=compute#backendService

Theoretically we would be able to filter in this query by adding backends:group = {neg_link}, but this doesn't seem to work in this query, so we would have to filter for a result that has a backend group set to the neg self_link in code.

aebrahim avatar Jul 24 '24 20:07 aebrahim

Quering the API to determine the backend ID is an interesting idea, and this approach would avoid the potential security risks of accepting all audiences. We could do that lookup during startup.

However, there are a few downsides:

  1. This requires the Compute Engine API, which typically isn't enabled in these projects.
  2. Listing backend services requires the compute.backendServices.list permission, which the application service account doesn't have by default.
  3. It requires an extra Maven dependency.

So maybe it's easier to go ahead with your PR.

jpassing avatar Jul 26 '24 06:07 jpassing

Thanks @jpassing for considering this idea, and for your hard work on this project in general.

I agree that the approach here is a lot more work than #449, so we should start with that for now and keep this idea as a longer-term discussion if that's ok with you. I have some thoughts on the downsides you listed.

  1. I disagree on this one, if you are using cloud run, you need to setup a compute backend service, which would require enabling the compute engine API already AFAIK.
  2. I agree that one is definitely a downside in that it is more work/complexity. I don't think compute.backendServices.list is inherently the problem from a security standpoint, because that is a lot less dangerous than the roles/resourcemanager.projectIamAdmin and roles/cloudasset.viewer that are already needed. However, it would need a new custom role which is a little annoying. Maybe this would be a fallback behavior if IAP_BACKEND_SERVICE_ID is unspecified, and is just a little bit more work if you don't want to specify it? For manual CLI creators, specifying IAP_BACKEND_SERVICE_ID will be a lot easier, and for terraform users a new custom role isn't too bad.
  3. It looks like RuntimeEnvironment.getMetadata already makes REST calls - could that same method be used?

aebrahim avatar Jul 26 '24 20:07 aebrahim

FWIW: we had the same cycle issue and "solved" it by ensuring we know the cloud run name beforehand - removing the pending creation dependency. We did this by simply creating a local variable being the name, and passing that as the cloud run name to the cloud run resource as well as the network endpoint group that is referencing the cloud run. Now in theory this could lead to race conditions, as this means there is no direct dependency on the actual cloud run anymore when creating the network endpoint group - but in reality this has not presented us with any issues. We've done 10s of complete destroy/create cycles of our JIT module, and never had any issues.

Pseudocode with what we effectively did based on your initial issue creation:

locals {
  cloud_run_name = "demo"
}

resource "google_cloud_run_v2_service" "jit_access" {
  name     = var.cloud_run_name

  template {
    containers {
      env {
        name  = "IAP_BACKEND_SERVICE_ID"
        value = google_compute_backend_service.jit_access.generated_id
      }
    }
  }
}

resource "google_compute_region_network_endpoint_group" "jit_access" {
  name                  = "jit-access"

  cloud_run {
    service = var.cloud_run_name 
  }
}

fancybear-dev avatar Jul 30 '24 06:07 fancybear-dev

Release 1.8.1 now lets you control whether JIT Access should verify the audience of IAP assertions by using the IAP_VERIFY_AUDIENCE configuration option, so I'll close this issue.

jpassing avatar Aug 07 '24 23:08 jpassing