terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[ISSUE] non-admin account: destroying a cluster with `access_control` leaves it in a broken state
Hi there,
We are in a situation where a different team creates and grands our developers access to a Azure Databricks instance. Due to compliance reasons our team is not allowed to have the admin permissions. However, we want to use Terraform to manage the "content" (clusters, repos, DBFS files, libraries, ...) in our Databricks instance., thinking that as long as the user used can do our desired tasks via the UI it should be able to do the same via Terraform.
Configuration
terraform {
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.5.0"
}
}
}
provider "databricks" {
}
data "databricks_node_type" "smallest" {
local_disk = true
}
data "databricks_spark_version" "latest_lts" {
long_term_support = true
}
resource "databricks_cluster" "some_cluster" {
cluster_name = "terraform-test"
spark_version = [data.databricks_spark_version.latest_lts.id](http://data.databricks_spark_version.latest_lts.id/)
node_type_id = [data.databricks_node_type.smallest.id](http://data.databricks_node_type.smallest.id/)
autotermination_minutes = 10
spark_conf = {
# Single Node
"spark.databricks.cluster.profile" : "singleNode"
"spark.master" : "local[*]"
}
custom_tags = {
"ResourceClass" = "SingleNode"
}
}
resource "databricks_permissions" "cluster_usage" {
cluster_id = databricks_cluster.some_cluster.cluster_id
access_control {
group_name = "group-name"
permission_level = "CAN_MANAGE"
}
}
Expected Behavior
Delete the defined cluster.
Actual Behavior
The operation fails with this error:
Error: <account-name> does not have Owner permissions on <cluster-id>. Please contact the owner or an administrator for access.
Additionally, the cluster is left in a state where only administrators can interact with it (destroy, start, stop, manage permissions, ...).
Steps to Reproduce
Please list the steps required to reproduce the issue, for example: 0. use a non-admin Databricks account
terraform applyterraform destroy
Terraform and provider versions
Terraform v1.1.6 on windows_amd64
- provider registry.terraform.io/databrickslabs/databricks v0.5.0
Important Factoids
The account used is not an admin account.
@m-theisen is the cluster created by the same user?
@nfx yes it is. It is created in the terraform apply step (with the same user).
My suspicion of what happens is that on the destroy command terraform removes the cluster_usage resource, which removes its own cluster manage permission. What I understand from the documentation this should not happen because the cluster owner should retain/get back its permissions.
@m-theisen Please provide debug logs with API traffic for further investigation.
Primary use of terraform is for the admin level privileges
@nfx sorry for not providing the log from the start but here it is now: https://gist.github.com/m-theisen/1b9cfed16e687ef6810aac9c5d7c133f
@m-theisen okay, I see now from logs what the problem you're facing. Currently core provider team is busy on other priorities, and would be glad if you submit a Pull Request with a fix concerning your situation. Take a look ar permissions/resource_permissions.go, specifically for jobs handling, and add a corner case for "removing permissions" on cluster, by setting the CAN_MANAGE level to creator_user_name from ClusterInfo. And a unit test, of course.
@nfx thanks for the quick feedback! I have to see if I get the time to create this PR myself. I'll come back to you once I do.
Contributing.md has local dev env instructions, so probably start there
@m-theisen any updates on the PR for this?
@nfx actually yes. I was looking into what needs changing and as you said it isn't so much.
But while investigating this I noticed something else (you are probably already aware of). We will run into this particular "destroy issue" not only with clusters but any other resource to which we attach a "non-default" permission. Since destroying the permission object means overwriting the existing list with the "admin-only" permission list, which prevents our non-admin user from modifying the resource further.
In the case of clusters there's the solution of assigning the cluster owner CAN_MANAGE permissions upon destruction (which I think also makes sense in the context of the Databricks access policies, considering that these are the default permissions). As far as I could see most other resources don't have something like an owner (which makes sense as well).
So currently we are re-evaluating if we can use terraform for what we have in mind or not. Currently I don't see a solution for our permission "problem" apart from giving the current user CAN_MANAGE permissions for each resource where the attached permission gets destroyed. But I don't really think this is a viable solution which should be implemented here.
The conclusion is: I think adding the corner case for the cluster permissions makes sense, but it became a lower priority for us/me.
In the case of clusters there's the solution of assigning the cluster owner CAN_MANAGE permissions upon destruction
@m-theisen that's what i've meant
In the case of clusters there's the solution of assigning the cluster owner CAN_MANAGE permissions upon destruction
@m-theisen that's what i've meant
Yes, that's how I understood you. But while looking into how I would implement this I noticed the other "problems".
@m-theisen
This is not terraform best practice, but I believe it might be a workaround for the issue you are encountering.
Try: terraform apply terraform state list terraform state rm databricks_permissions.cluster_usage terraform destroy
I'm seeing an error that I believe is related to this issue. terraform version=0.15.5 terragrunt version=0.27.1 Databricks provider version=0.3.7
We're creating an all purpose cluster using a service principal. We want to also grant access to a Databricks group, using the following code:
resource "databricks_permissions" "cluster_usage_group" {
cluster_id = databricks_cluster.aqua_demo.cluster_id
access_control {
group_name = var.team_name
permission_level = "CAN_RESTART"
}
}
Doing this creates the cluster and grants this access to the team. However, it replaces the owner permissions from the service principal. When we don't include this block, the service principal retains its permissions. Our deployment succeeds but the service principal doesn't have access to delete the cluster later since it's no longer an owner. We also tried to get the cluster ID as an output from the deployment, and this failed because the service principal no longer had access to the cluster to get it's ID. I think the root of the issue that @m-theisen faced. When adding cluster usage permissions, it removed manage access from the creator of the cluster
Following up - is this issue still relevant?
@nfx we would still very much like to manage our resources in Databricks via terraform with the limited non-admin user. We are, however, currently not investing time to solve this cluster permission issue, without having a clear way forward how we could also solve the problem for the other resource types.
@m-theisen have you tested this with the latest version of the provider? there was a fix for MLflow model and iirc it applies to all databricks_permissions resource
Thanks @nkvuong for pointing this out. That sounds promising. We have not tested it yet, though. I won't have time this week but I'll hopefully be able to get back to you next week.
@nkvuong I was finally able to do some testing using the databricks/databricks v1.2.1 and unfortunately the issue seems to persist. I had a look at the PR you linked and if I am not mistaken (I'm not at all fluent in go) this line
for _, prefix := range [...]string{"/registered-models/"} {
explicitly restricts the new feature to only apply to the ML Models and not to other resources, like clusters.