terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[ISSUE] `databricks_file` resource does not store md5 in state
Configuration
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
version = "1.37.0"
}
}
}
provider "databricks" {}
data "databricks_current_user" "me" {}
resource "databricks_file" "example" {
source = "${path.module}/hello.sh"
# Assuming this volume already exists.
path = "/Volume/example/default/hello.sh"
}
resource "databricks_workspace_file" "example" {
source = "${path.module}/hello.sh"
path = "${data.databricks_current_user.me.home}/hello.sh"
}
Expected Behavior
The md5 attribute should be populated with the md5 hash of the source file for both resources.
Actual Behavior
It is only populated for databricks_workspace_file.
On the first apply both md5 attribute show as the default "different".
Changing the file and re-applying, the databricks_workspace_file shows a change on the md5 attribute from the md5 of the file to "different", but databricks_file does not detect any change.
In both resources it's logging the md5 correctly, but only databricks_workspace_file is saving the md5 hash to state. It shows in the diff as "different".
Steps to Reproduce
- Create any example file,
echo '#!/bin/bash\n\necho "Hello World"' > hello.sh - Update the example paths as needed / create a Volume
terraform apply
Terraform and provider versions
Terraform v1.7.4 on linux_amd64
- provider registry.terraform.io/databricks/databricks v1.37.0
Is it a regression?
No. databricks_file resource was only released in the latest (v1.37.0) version.
Debug Output
2024-02-22T18:19:51.712+1000 [DEBUG] refresh: databricks_file.example: no state, so not refreshing data.databricks_current_user.me: Reading...
2024-02-22T18:19:51.714+1000 [INFO] provider.terraform-provider-databricks_v1.37.0: Reading /home/terry/example/hello.sh: timestamp="2024-02-22T18:19:51.714+1000"
2024-02-22T18:19:51.714+1000 [INFO] provider.terraform-provider-databricks_v1.37.0: Setting file content hash to 9eb1ab7f9045cd04748e5798ea9e0cb6: timestamp="2024-02-22T18:19:51.714+1000"
2024-02-22T18:19:51.714+1000 [INFO] provider.terraform-provider-databricks_v1.37.0: Suppressing diff: false: timestamp="2024-02-22T18:19:51.714+1000"
2024-02-22T18:19:51.715+1000 [WARN] Provider "registry.terraform.io/databricks/databricks" produced an invalid plan for databricks_file.example, but we are tolerating it because it is using the legacy plugin SDK.
2024-02-22T18:19:52.197+1000 [DEBUG] refresh: databricks_workspace_file.example: no state, so not refreshing
2024-02-22T18:19:52.201+1000 [INFO] provider.terraform-provider-databricks_v1.37.0: Reading /home/terry/example/hello.sh: timestamp="2024-02-22T18:19:52.201+1000"
2024-02-22T18:19:52.201+1000 [INFO] provider.terraform-provider-databricks_v1.37.0: Setting file content hash to 9eb1ab7f9045cd04748e5798ea9e0cb6: timestamp="2024-02-22T18:19:52.201+1000"
2024-02-22T18:19:52.201+1000 [INFO] provider.terraform-provider-databricks_v1.37.0: Suppressing diff: false: timestamp="2024-02-22T18:19:52.201+1000"
2024-02-22T18:19:52.203+1000 [WARN] Provider "registry.terraform.io/databricks/databricks" produced an invalid plan for databricks_workspace_file.example, but we are tolerating it because it is using the legacy plugin SDK.
path=.terraform/providers/registry.terraform.io/databricks/databricks/1.37.0/linux_amd64/terraform-provider-databricks_v1.37.0 pid=2756954
# databricks_file.example will be created
+ resource "databricks_file" "example" {
+ file_size = (known after apply)
+ id = (known after apply)
+ md5 = "different"
+ modification_time = (known after apply)
+ path = "/Volumes/example/default/hello.sh"
+ source = "/home/terry/example/hello.sh"
}
# databricks_workspace_file.example will be created
+ resource "databricks_workspace_file" "example" {
+ id = (known after apply)
+ md5 = "different"
+ object_id = (known after apply)
+ path = "/Users/[email protected]/hello.sh"
+ source = "/home/terry/example/hello.sh"
+ url = (known after apply)
+ workspace_path = (known after apply)
}
Plan: 2 to add, 0 to change, 0 to destroy.
After changing the file and re-apply:
# databricks_workspace_file.example will be updated in-place
~ resource "databricks_workspace_file" "example" {
id = "/Users/[email protected]/hello.sh"
~ md5 = "9eb1ab7f9045cd04748e5798ea9e0cb6" -> "different"
# (5 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
Important Factoids
- #3265
databricks_workspace_filereads function: workspace/file_resource.go#L36databricks_fileread function: storage/resource_file.go#L16
For now, you can get around this by manually setting the md5 attribute, but the attribute is not documented:
locals {
file = "{path.module}/hello.sh"
}
resource "databricks_file" "example" {
source = local.file
md5 = filemd5(local.file)
path = "/Volumes/example/default/hello.sh"
}
Hi @terrymunro, thanks for reaching out. We will take a look.
We are also having this issue. databricks_file does not detect when the source file has changed.
Thank you for your help on this issue :)
It looks like this might be fixed with https://github.com/databricks/terraform-provider-databricks/pull/3662
This is also an issue with the databricks_global_init_script resource but I don't think the above PR will fix the init script resource. The md5 attribute trick mentioned above works around the issue though.