terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[ISSUE] Issue with `databricks_sql_table` resource during table schema evolution
When a new column is added to the column configuration in the end
, the databricks_sql_table resource recreates a table instead of altering the table and adding the new column.
The table was first created using the same table configuration provided below except that name
column was not there.
Configuration
resource "databricks_sql_table" "test_managed" {
name = "test_managed"
catalog_name = "my_catalog"
schema_name = "my_schema"
table_type = "MANAGED"
data_source_format = "DELTA"
column {
name = "id"
type = "int"
}
column {
name = "sid"
type = "string"
}
column {
name = "name"
type = "string"
}
properties = {
"delta.autoOptimize.autoCompact" = true,
"delta.autoOptimize.optimizeWrite" = true,
"delta.enableChangeDataFeed" = true
}
}
Expected Behavior
Any new column added in the end, should be treated as an ALTER to the table and the new column is added without destroying & recreating the table with the new column. The recreation causes the data in the table to be deleted, which is undesirable. When using SQL interface (on a SQL warehouse) to do the same operation of adding a new column in the end, it just ALTERs the table and adds the new column in the end. The data is retained as well. We should have the same behavior in terraform as well.
Actual Behavior
The table gets destroyed and recreated. The data in the table is lost
.
Steps to Reproduce
- Create a table with 2 columns
id, sid
:
resource "databricks_sql_table" "test_managed" {
name = "test_managed"
catalog_name = "my_catalog"
schema_name = "my_schema"
table_type = "MANAGED"
data_source_format = "DELTA"
column {
name = "id"
type = "int"
}
column {
name = "sid"
type = "string"
}
properties = {
"delta.autoOptimize.autoCompact" = true,
"delta.autoOptimize.optimizeWrite" = true,
"delta.enableChangeDataFeed" = true
}
}
- Insert sample rows:
insert into table my_catalog.my_schema.test_managed
values
(1,"sid_1")
,(2,"sid_2");
- Alter the table by adding a new column
name
at the end:
resource "databricks_sql_table" "test_managed" {
name = "test_managed"
catalog_name = "my_catalog"
schema_name = "my_schema"
table_type = "MANAGED"
data_source_format = "DELTA"
column {
name = "id"
type = "int"
}
column {
name = "sid"
type = "string"
}
column {
name = "name"
type = "string"
}
properties = {
"delta.autoOptimize.autoCompact" = true,
"delta.autoOptimize.optimizeWrite" = true,
"delta.enableChangeDataFeed" = true
}
}
- The plan shows
# forces replacement
due to column addition:
21:44:55 # databricks_sql_table.test_managed must be replaced
21:44:55 -/+ resource "databricks_sql_table" "test_managed" {
21:44:55 + cluster_id = (known after apply)
21:44:55 ~ id = "my_catalog.my_schema.test_managed" -> (known after apply)
21:44:55 name = "test_managed"
21:44:55 ~ properties = {
21:44:55 - "delta.lastCommitTimestamp" = "1701920126000" -> null
21:44:55 - "delta.lastUpdateVersion" = "0" -> null
21:44:55 - "delta.minReaderVersion" = "1" -> null
21:44:55 - "delta.minWriterVersion" = "4" -> null
21:44:55 # (3 unchanged elements hidden)
21:44:55 }
21:44:55 - storage_location = "<redacted>" -> null
21:44:55 # (4 unchanged attributes hidden)
21:44:55
21:44:55 + column { # forces replacement
21:44:55 + name = "name"
21:44:55 + nullable = true
21:44:55 + type = "string"
21:44:55 }
21:44:55
21:44:55 # (2 unchanged blocks hidden)
21:44:55 }
- Upon apply the table is destroyed and a new table is created in a new s3 prefix.
- The data is lost as well. The below query results 0 rows:
select * from my_catalog.my_schema.test_managed;
NOTE: When a similar attempt is made on an external
table, the table gets destroyed. But, it fails with the error below upon apply
:
21:48:18 │ Error: cannot create sql table: cannot execute CREATE EXTERNAL TABLE `my_catalog`.`my_schema`.`test_external` (id int, sid string, name string)
21:48:18 │ USING DELTA
21:48:18 │ TBLPROPERTIES ('delta.autoOptimize.autoCompact'='true', 'delta.autoOptimize.optimizeWrite'='true', 'delta.enableChangeDataFeed'='true')
21:48:18 │ LOCATION '<REDACTED>';: The specified schema does not match the existing schema at <REDACTED>.
21:48:18 │
21:48:18 │ with databricks_sql_table.test_external,
21:48:18 │ on table-test.tf line 28, in resource "databricks_sql_table" "test_external":
21:48:18 │ 28: resource "databricks_sql_table" "test_external" {
Terraform and provider versions
Terraform v1.5.7
provider registry.terraform.io/databricks/databricks v1.31.0
provider registry.terraform.io/hashicorp/aws v4.67.0
provider registry.terraform.io/hashicorp/local v2.4.0
Is it a regression?
Unknown
@vsluc yes, this is a gap in the databricks_sql_table
resource at the moment, where changes to columns are handled as re-creation
Additionally when creating an external table, the Databricks Runtime checks that the schema specified in the CTAS statement & the existing schema matches, and will throw an error otherwise. One way to handle this is to provide a parameter to not send the schema to the runtime
+1
Having the same issue here.
Please fix this
It would be nice to have this fixed
Please fix this issue
Hello. @nkvuong, any news on this issue? This is very blocking. Thanks
The issue has been fixed with the following PRs: https://github.com/databricks/terraform-provider-databricks/pull/3359 https://github.com/databricks/terraform-provider-databricks/pull/3381
They will go out with the next release.
Thanks @edwardfeng-db . Much anticipate change.