terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[FEATURE] Provider issue with MLflow Model import with tags
Configuration
resource "databricks_mlflow_model" "model" {
name = "model"
description = "MLflow registered model"
}
Run command: terraform import databricks_mlflow_model.model "model"
Expected Behavior
Model imports successfully with no data being deleted on subsequent terraform apply.
Actual Behavior
If model with tags is imported and then terraform apply is run, the model's tags are deleted.
Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
- Create model on Databricks with some tags.
- Create configuration file as above and run import command mentioned above.
- Do
terraform applyand it will delete the tags on the model.
Terraform and provider versions
Terraform v1.1.6
on darwin_amd64
+ provider registry.terraform.io/databrickslabs/databricks v0.5.4
Isn't this how Terraform reconciles its state, when you're running terraform import
The model definition here does not have any tags, but the target model has some tags. Terraform correctly identifies the configuration drift and readjust accordingly.
The missing feature here is supporting MLflow objects in the exporter tool
@nkvuong Ideally we want to set some TF attribute like computed or suppress_diff so that it wouldn't delete existing tags.
@arpitjasa-db yes, it has to be added on type struct level
Thank you for the feature request! Currently, the team operates in a limited capacity, carefully prioritizing, and we cannot provide a timeline to implement this feature. Please make a Pull Request if you'd like to see this feature sooner, and we'll guide you through the journey.
@nkvuong one of the other issues with MLflow models is that there are many system-generated tags. This makes me think we should add a tf:"computed" attribute to its definition, but I'm not sure how that would behave if someone were to define tags in their configuration file. This makes me lean towards the tf:"suppress_diff" attribute. Do you know what would happen if we did both, e.g. tf:"computed, suppress_diff"? Is that even allowed?
@nfx is there any documentation that details the behavior of different tf attributes? All I could find was this but that doesn't even list the default suppress_diff attribute.
@arpitjasa-db suppress_diff is annotation within our provider to simplify schema definition. https://github.com/databricks/terraform-provider-databricks/blob/master/common/reflect_resource.go#L128
we basically need to implement https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#SchemaDiffSuppressFunc for databricks_mlflow_model
@arpitjasa-db how should the PATCH API be constructed to keep all the tags? This is my WIP branch - https://github.com/databricks/terraform-provider-databricks/tree/feature/suppress_diff_model, but the API call does not get all the correct tags registered.
@nkvuong your branch looks fine but one thing I noticed was shouldn't we merge in the new tags as well (and dedup as necessary)?
@arpitjasa-db For the update operation, I add all the old tags to the resource (which has the new tags from HCL), dedup and sort before sending it to the API. The API does not set the tags correctly though, so could you advise on what it should be
@nkvuong oh I see the issue. I don't think that API call accepts tags in the request: https://mlflow.org/docs/latest/rest-api.html#update-registeredmodel. AFAIK this is the only way to set a model tag: https://mlflow.org/docs/latest/rest-api.html#set-registered-model-tag
@nkvuong also even if the update API could receive tags, I just realized that users basically would not be able to delete tags from their Terraform configuration right? Since we would treat that as an "old" tag and add it back in?
Is there a way to differentiate system-generated tags vs Terraform-config generated tags?
Is there a way to differentiate system-generated tags vs Terraform-config generated tags?
From an MLflow perspective, system-generated tags always start with the mlflow. prefix. I'm a little wary of any solution to mark tags for models via supress_diff, computed, etc since it'll make them behave differently from tags on jobs, clusters, etc.
I also agree with @nkvuong in https://github.com/databricks/terraform-provider-databricks/issues/1225#issuecomment-1106415778 that if user-defined tags are deleted after import, that's expected behavior. We could consider broadly throwing on import if the config of the object being imported doesn't match the config in .tf files, but that doesn't seem to be the pattern in Terraform
To avoid issues with system-generated tags, is it possible to just filter out tags that start with the mlflow. prefix when managing model tags via the registered model resource? These system-generated tags correspond to external features like model monitoring. Later, when we add model monitoring as its own terraform resource/attribute on the model resource, we can make the TF provider aware of these system tags
Following up - is this issue still relevant?
@nfx yeah currently updating tags via Terraform is broken. Model tags can only be created upon creation, but cannot be updated.
Thank you for the feature request! Currently, the team operates in a limited capacity, carefully prioritizing, and we cannot provide a timeline to implement this feature. Please make a Pull Request if you'd like to see this feature sooner, and we'll guide you through the journey.