terraform-provider-databricks
terraform-provider-databricks copied to clipboard
Make notebook extensions more obvious
Changes
databricks workspace notebook resource modified to allow inclusion of extension in module call.
Previous behaviour:
Following resource config would create notebook.py.py on the workspace, but databricks UI masks the extension for notebooks so it's not apparent.
resource "databricks_notebook" "some_notebook" {
source = "./notebook.py"
path = "/notebook.py"
}
New behaviour
Both of these configurations will now create only notebook.py on the workspace
resource "databricks_notebook" "some_notebook" {
source = "./notebook.py"
path = "/notebook.py"
}
resource "databricks_notebook" "some_notebook" {
source = "./notebook.py"
path = "/notebook"
}
Tests
- [x]
make testrun locally - [x] covered with integration tests in
internal/acceptance - [x] relevant acceptance tests are passing
- [x] using Go SDK
- [x] confirmed behaviour and no regression with delv terraform apply
- [x] added additional unit test
@nfx @alexott This is a really clean fix to a super annoying problem for users.
There's 3 things I like about it. Firstly it still allows you to omit the Language and pick it up based on the source extension, second it uses Ext which is super clean and will work whether you use an extension in the path or not. Third it includes updates to the tests.
The only thing this won't let you do anymore is end up with broken notebook.py.py files that are masked in the UI and look like they are called notebook.py which then makes our data scientists scratch their heads when they try to do %run notebook magic commands which don't work.
To be honest, the UI is being overly helpful (i.e. painful) by masking file extensions, and why importing a notebook adds an extra file extension is beyond me.
Codecov Report
Merging #2994 (361769f) into master (754aad4) will decrease coverage by
3.90%. Report is 216 commits behind head on master. The diff coverage is73.57%.
Additional details and impacted files
@@ Coverage Diff @@
## master #2994 +/- ##
==========================================
- Coverage 88.21% 84.32% -3.90%
==========================================
Files 145 157 +12
Lines 11989 13728 +1739
==========================================
+ Hits 10576 11576 +1000
- Misses 930 1510 +580
- Partials 483 642 +159
| Files | Coverage Δ | |
|---|---|---|
| access/resource_ip_access_list.go | 80.95% <100.00%> (ø) |
|
| access/resource_permission_assignment.go | 95.23% <100.00%> (-0.12%) |
:arrow_down: |
| aws/resource_service_principal_role.go | 73.33% <100.00%> (ø) |
|
| catalog/data_metastores.go | 84.61% <100.00%> (ø) |
|
| catalog/resource_share.go | 91.39% <100.00%> (+0.18%) |
:arrow_up: |
| catalog/resource_table.go | 100.00% <100.00%> (ø) |
|
| catalog/resource_volume.go | 84.00% <ø> (ø) |
|
| clusters/data_spark_version.go | 94.73% <100.00%> (+0.19%) |
:arrow_up: |
| clusters/resource_cluster.go | 85.71% <100.00%> (+0.60%) |
:arrow_up: |
| clusters/resource_library.go | 82.35% <100.00%> (-2.50%) |
:arrow_down: |
| ... and 67 more |
This is a breaking change, so, imho, you need to add a flag that will make the new behavior explicit (maybe it could be a provider-level flag). The
pathargument is always used as-is, so when you set the file extension, then it will be set in UI as well, and the second extension will be added only during the export command.We also need the doc that documents that behavior.
Thanks Alex, yeah I just tried os.listdir('/Workspace') and it indeed only returns .py when extension is specified in path; as opposed to .py.py; so the double extension is limited to export only in GUI.
Ideally then, this should be fixed in the export API rather than playing with the path. For now though, I can add a flag and call it trim_export_extension:
- defaulting to
falsewhich makes it usepathas-is - setting to
truewill make it trim extension and prevent double extension on export
I think it would be sufficient to just have a resource level flag for now rather than provider level; cause this behaviour is not seen for stuff created with workspace_files resource.
And yes, I'll add documentation as well to capture behaviour
@alexott updated as per CR but flag set to resource level 👍
hi @alexott do you have any other reservations for merge?
This might seem like a very round about way to fix the export bug .py.py but I'll explain briefly why we want it (even when not using GUI to export source file).
background
- Our data science workloads are written as
notebooks. Source of truth is agit repo. Notebooks are stored as.pyingit. Job tasks sometimes call%run somenote_book.py. - Our MLOps pipeline deploys these notebooks as
Workspace Notebookwith associatedJobconfig intostagingandprodenvironments usingterraform. This is done with custom modules callingdatabricks_notebookresource. - Development workflow sometimes require data scientists to pull a git branch into
Databricks Workspace Repoand run notebooks from there.
problem
The default behaviour of notebooks in Databricks Workspace and Databricks Workspace Repo differ.
Workspace Notebooksnames are defined bypathunmodifed.Repo Notebooksalways ommit extension.
That's the problem for data scientists; they cannot mix and run notebooks created by these 2 different methods.
We could fix it within our custom module by always stripping extension when calling this resource; but IMO it should be fixed upstream in the provider. Please let us know what you think.
We need to discuss about it...