terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[FEATURE] Issue with `databricks_cluster` resource - `runtime_engine` field not supported
Configuration
We are trying to spin up a cluster using Graviton instances using Photon (instance type m6gd.*). As per the Databricks documentation we should configure the cluster using something like:
{
"cluster_name": "my-cluster",
"spark_version": "10.2.x-scala2.12",
"node_type_id": "m6gd.large",
"num_workers": 2,
"runtime_engine": "PHOTON"
}
But it looks like the runtime_engine field is not available in the Databricks Terraform Provider.
Expected Behavior
Having a configuration like this:
resource "databricks_cluster" "my_cluster" {
cluster_name = my-cluster
spark_version = 10.2.x-scala2.12
node_type_id = m6gd.large
runtime_engine = PHOTON
}
A new cluster should be created.
Actual Behavior
The apply step fails.
Steps to Reproduce
terraform apply
Terraform and provider versions
Debug Output
Important Factoids
In terraform it's doable with the data sources: databricks_node_type (doc) and databricks_spark_version (doc):
data "databricks_node_type" "photon" {
photon_worker_capable = true
graviton = true
}
data "databricks_spark_version" "photon_lts" {
long_term_support = true
graviton = true
photon = true
}
We ended up using spark_version = 10.4.x-aarch64-photon-scala2.12, but I think the provider should expose the runtime_engine field. It makes things simpler
What is wrong with the data sources?
It makes more sense if the databricks_cluster resource mirrors the documentation and the manual configuration that people can do in the UI (e.g. setting the specific runtime).
Using the databricks_node_type or databricks_spark_version is an unnecessary complication.