terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[ISSUE] Group privileges don't propagate to members while creating `databricks_external_location`
When trying to create an external location with a service principal, which is set indirectly as owner of the external location through a group membership (i.e. the group is the owner and the service principal is a member of the group), Terraform fails during creation with the following error:
Error: cannot read external location: User does not have CREATE CONNECTION on External Location '[REDACTED]'.
It seems as if the group membership claim can't propagate in time while the external location's creation is being polled, so the service principal running Terraform can't poll its creation status, ultimately failing
Configuration
terraform {
required_providers {
databricks = {
configuration_aliases = [databricks.account]
source = "databricks/databricks"
}
}
}
resource "databricks_group" "this" {
provider = databricks.account
display_name = "debug-group"
}
data "databricks_service_principal" "this" {
provider = databricks.account
application_id = var.application_id
}
resource "databricks_group_member" "this" {
provider = databricks.account
group_id = databricks_group.this.id
member_id = data.databricks_service_principal.this.id
}
resource "databricks_grant" "metastore" {
metastore = var.metastore_id
principal = data.databricks_service_principal.this.application_id
privileges = ["CREATE_EXTERNAL_LOCATION"]
}
resource "databricks_grant" "storage_credential" {
storage_credential = var.storage_credential_name
principal = data.databricks_service_principal.this.application_id
privileges = ["CREATE_EXTERNAL_LOCATION"]
}
resource "databricks_external_location" "this" {
name = "test_external_location"
url = var.container_uri
credential_name = var.storage_credential_name
owner = databricks_group.this.display_name
depends_on = [
databricks_group_member.this,
databricks_grant.metastore,
databricks_grant.storage_credential
]
}
Expected Behavior
External location creation is successful.
Actual Behavior
Creation fails with the following error:
Error: cannot read external location: User does not have CREATE CONNECTION on External Location '[REDACTED]'.
Steps to Reproduce
Terraform v1.7.4 databricks/databricks v1.37.1
Is it a regression?
It had been working properly up to now. I suspect there was some change in the Databricks REST API backend
Debug Output
See attached gist
Important Factoids
Running a new plan immediately after the first failure will fail again, presumably because the group membership claim hasn't propagated, so the service principal doesn't have the permission to examine the state of the created external location. After a couple of tries, the plan will succeed though and Terraform will be able to identify the external location (as if the membership claim was now available). I've also experienced though that the external location isn't created properly and gets marked as tainted:
I've tried adding a time_sleep
between group membership creation and external location creation to no avail.
Would you like to implement a fix?
I'd love to but I don't have the time beside my day job.
I just had the same problem as above already while creating an external location! Terraform started creating it and after some time it failed with the error:
Error: cannot read external location: User does not have CREATE CONNECTION on External Location '[REDACTED]'.
As if it hadn't been able to check the state once created.
My next Terraform plan indicated that the external location resource was tainted and had to be replaced, which it was able tu do at apply.
the same problem on azure. is any updates?
I looked at the gist, but I'm a bit confused by the log output. It seems like there may be some lines missing from the log. Do you have any idea why this might be?
For example: these lines come from the 01-apply.log
file:
2024-03-27T16:41:19.8216138Z 2024-03-27T16:41:19.821Z [DEBUG] provider.terraform-provider-databricks_v1.38.0: POST /api/2.1/unity-catalog/external-locations
2024-03-27T16:41:19.8220906Z < }: tf_provider_addr=registry.terraform.io/databricks/databricks tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 @module=databricks tf_req_id=b00c8037-facb-a840-26ab-7ff224ff5416 tf_resource_type=databricks_external_location timestamp=2024-03-27T16:41:19.821Z
2024-03-27T16:41:20.2395508Z 2024-03-27T16:41:20.239Z [DEBUG] provider.terraform-provider-databricks_v1.38.0: PATCH /api/2.1/unity-catalog/external-locations/test_external_location
2024-03-27T16:41:20.2403958Z < }: tf_rpc=ApplyResourceChange tf_provider_addr=registry.terraform.io/databricks/databricks tf_req_id=b00c8037-facb-a840-26ab-7ff224ff5416 tf_resource_type=databricks_external_location @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 @module=databricks timestamp=2024-03-27T16:41:20.239Z
2024-03-27T16:41:20.3143044Z 2024-03-27T16:41:20.313Z [DEBUG] provider.terraform-provider-databricks_v1.38.0: non-retriable error: User does not have CREATE CONNECTION on External Location 'test_external_location'.: @module=databricks tf_provider_addr=registry.terraform.io/databricks/databricks tf_resource_type=databricks_external_location tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 tf_req_id=b00c8037-facb-a840-26ab-7ff224ff5416 timestamp=2024-03-27T16:41:20.312Z
Debug logging should include the request and response bodies, but those seem to be missing here. We only see the first and last line of each log message. Is it possible there is something causing the logs to be partially dropped?
Sorry for my belated reply @mgyucht. The debug logs got cut when piped as suggested by your GitHub Issue template
.
I haven't been able to consistently reproduce the above mentioned apply-error recently. However, a similar error surfaces during a subsequent terraform plan
execution. It seems Databricks doesn't recognize the indirect permissions on the service principal through its group membership. However, a second execution of terraform plan
already succeeds, as if the privilege has finally propagated in the Databricks backend.
Please have a look at the updated gist: https://gist.github.com/camilo-s/48182d8940298b5912514c75268e6387 (With apologies if I filtered out too much content again -- It's usually challenging to filter the whole debug log for sensitive data.)