terraform-provider-databricks icon indicating copy to clipboard operation
terraform-provider-databricks copied to clipboard

[ISSUE] Group privileges don't propagate to members while creating `databricks_external_location`

Open camilo-s opened this issue 1 year ago • 4 comments

When trying to create an external location with a service principal, which is set indirectly as owner of the external location through a group membership (i.e. the group is the owner and the service principal is a member of the group), Terraform fails during creation with the following error:

Error: cannot read external location: User does not have CREATE CONNECTION on External Location '[REDACTED]'.

It seems as if the group membership claim can't propagate in time while the external location's creation is being polled, so the service principal running Terraform can't poll its creation status, ultimately failing

Configuration

terraform {
  required_providers {
    databricks = {
      configuration_aliases = [databricks.account]
      source                = "databricks/databricks"
    }
  }
}

resource "databricks_group" "this" {
  provider     = databricks.account
  display_name = "debug-group"
}

data "databricks_service_principal" "this" {
  provider       = databricks.account
  application_id = var.application_id
}

resource "databricks_group_member" "this" {
  provider = databricks.account

  group_id  = databricks_group.this.id
  member_id = data.databricks_service_principal.this.id
}

resource "databricks_grant" "metastore" {
  metastore  = var.metastore_id
  principal  = data.databricks_service_principal.this.application_id
  privileges = ["CREATE_EXTERNAL_LOCATION"]
}

resource "databricks_grant" "storage_credential" {
  storage_credential = var.storage_credential_name
  principal          = data.databricks_service_principal.this.application_id
  privileges         = ["CREATE_EXTERNAL_LOCATION"]
}

resource "databricks_external_location" "this" {
  name            = "test_external_location"
  url             = var.container_uri
  credential_name = var.storage_credential_name
  owner           = databricks_group.this.display_name

  depends_on = [
    databricks_group_member.this,
    databricks_grant.metastore,
    databricks_grant.storage_credential
  ]
}

Expected Behavior

External location creation is successful.

Actual Behavior

Creation fails with the following error:

Error: cannot read external location: User does not have CREATE CONNECTION on External Location '[REDACTED]'.

Steps to Reproduce

Terraform v1.7.4 databricks/databricks v1.37.1

Is it a regression?

It had been working properly up to now. I suspect there was some change in the Databricks REST API backend

Debug Output

See attached gist

Important Factoids

Running a new plan immediately after the first failure will fail again, presumably because the group membership claim hasn't propagated, so the service principal doesn't have the permission to examine the state of the created external location. After a couple of tries, the plan will succeed though and Terraform will be able to identify the external location (as if the membership claim was now available). I've also experienced though that the external location isn't created properly and gets marked as tainted: Screenshot 2024-03-27 at 13 53 06

I've tried adding a time_sleep between group membership creation and external location creation to no avail.

Would you like to implement a fix?

I'd love to but I don't have the time beside my day job.

camilo-s avatar Feb 27 '24 17:02 camilo-s

I just had the same problem as above already while creating an external location! Terraform started creating it and after some time it failed with the error:

Error: cannot read external location: User does not have CREATE CONNECTION on External Location '[REDACTED]'.

As if it hadn't been able to check the state once created.

My next Terraform plan indicated that the external location resource was tainted and had to be replaced, which it was able tu do at apply.

camilo-s avatar Mar 05 '24 17:03 camilo-s

the same problem on azure. is any updates?

vitalyu avatar Mar 23 '24 17:03 vitalyu

I looked at the gist, but I'm a bit confused by the log output. It seems like there may be some lines missing from the log. Do you have any idea why this might be?

For example: these lines come from the 01-apply.log file:

2024-03-27T16:41:19.8216138Z 2024-03-27T16:41:19.821Z [DEBUG] provider.terraform-provider-databricks_v1.38.0: POST /api/2.1/unity-catalog/external-locations
2024-03-27T16:41:19.8220906Z < }: tf_provider_addr=registry.terraform.io/databricks/databricks tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 @module=databricks tf_req_id=b00c8037-facb-a840-26ab-7ff224ff5416 tf_resource_type=databricks_external_location timestamp=2024-03-27T16:41:19.821Z
2024-03-27T16:41:20.2395508Z 2024-03-27T16:41:20.239Z [DEBUG] provider.terraform-provider-databricks_v1.38.0: PATCH /api/2.1/unity-catalog/external-locations/test_external_location
2024-03-27T16:41:20.2403958Z < }: tf_rpc=ApplyResourceChange tf_provider_addr=registry.terraform.io/databricks/databricks tf_req_id=b00c8037-facb-a840-26ab-7ff224ff5416 tf_resource_type=databricks_external_location @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 @module=databricks timestamp=2024-03-27T16:41:20.239Z
2024-03-27T16:41:20.3143044Z 2024-03-27T16:41:20.313Z [DEBUG] provider.terraform-provider-databricks_v1.38.0: non-retriable error: User does not have CREATE CONNECTION on External Location 'test_external_location'.: @module=databricks tf_provider_addr=registry.terraform.io/databricks/databricks tf_resource_type=databricks_external_location tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 tf_req_id=b00c8037-facb-a840-26ab-7ff224ff5416 timestamp=2024-03-27T16:41:20.312Z

Debug logging should include the request and response bodies, but those seem to be missing here. We only see the first and last line of each log message. Is it possible there is something causing the logs to be partially dropped?

mgyucht avatar May 02 '24 07:05 mgyucht

Sorry for my belated reply @mgyucht. The debug logs got cut when piped as suggested by your GitHub Issue template Screenshot 2024-05-24 at 13 20 52.

I haven't been able to consistently reproduce the above mentioned apply-error recently. However, a similar error surfaces during a subsequent terraform plan execution. It seems Databricks doesn't recognize the indirect permissions on the service principal through its group membership. However, a second execution of terraform plan already succeeds, as if the privilege has finally propagated in the Databricks backend.

Please have a look at the updated gist: https://gist.github.com/camilo-s/48182d8940298b5912514c75268e6387 (With apologies if I filtered out too much content again -- It's usually challenging to filter the whole debug log for sensitive data.)

camilo-s avatar May 24 '24 11:05 camilo-s