terraform-provider-databricks icon indicating copy to clipboard operation
terraform-provider-databricks copied to clipboard

Make notebook extensions more obvious

Open crankswagon opened this issue 1 year ago • 6 comments

Changes

databricks workspace notebook resource modified to allow inclusion of extension in module call.

Previous behaviour:

Following resource config would create notebook.py.py on the workspace, but databricks UI masks the extension for notebooks so it's not apparent.


resource "databricks_notebook" "some_notebook" {
source =  "./notebook.py"
path = "/notebook.py"
}

New behaviour

Both of these configurations will now create only notebook.py on the workspace


resource "databricks_notebook" "some_notebook" {
source =  "./notebook.py"
path = "/notebook.py"
}

resource "databricks_notebook" "some_notebook" {
source =  "./notebook.py"
path = "/notebook"
}

Tests

  • [x] make test run locally
  • [x] covered with integration tests in internal/acceptance
  • [x] relevant acceptance tests are passing
  • [x] using Go SDK
  • [x] confirmed behaviour and no regression with delv terraform apply
  • [x] added additional unit test

crankswagon avatar Dec 04 '23 23:12 crankswagon

@nfx @alexott This is a really clean fix to a super annoying problem for users.

There's 3 things I like about it. Firstly it still allows you to omit the Language and pick it up based on the source extension, second it uses Ext which is super clean and will work whether you use an extension in the path or not. Third it includes updates to the tests.

The only thing this won't let you do anymore is end up with broken notebook.py.py files that are masked in the UI and look like they are called notebook.py which then makes our data scientists scratch their heads when they try to do %run notebook magic commands which don't work.

To be honest, the UI is being overly helpful (i.e. painful) by masking file extensions, and why importing a notebook adds an extra file extension is beyond me.

davehowell avatar Dec 05 '23 00:12 davehowell

Codecov Report

Merging #2994 (361769f) into master (754aad4) will decrease coverage by 3.90%. Report is 216 commits behind head on master. The diff coverage is 73.57%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2994      +/-   ##
==========================================
- Coverage   88.21%   84.32%   -3.90%     
==========================================
  Files         145      157      +12     
  Lines       11989    13728    +1739     
==========================================
+ Hits        10576    11576    +1000     
- Misses        930     1510     +580     
- Partials      483      642     +159     
Files Coverage Δ
access/resource_ip_access_list.go 80.95% <100.00%> (ø)
access/resource_permission_assignment.go 95.23% <100.00%> (-0.12%) :arrow_down:
aws/resource_service_principal_role.go 73.33% <100.00%> (ø)
catalog/data_metastores.go 84.61% <100.00%> (ø)
catalog/resource_share.go 91.39% <100.00%> (+0.18%) :arrow_up:
catalog/resource_table.go 100.00% <100.00%> (ø)
catalog/resource_volume.go 84.00% <ø> (ø)
clusters/data_spark_version.go 94.73% <100.00%> (+0.19%) :arrow_up:
clusters/resource_cluster.go 85.71% <100.00%> (+0.60%) :arrow_up:
clusters/resource_library.go 82.35% <100.00%> (-2.50%) :arrow_down:
... and 67 more

... and 1 file with indirect coverage changes

codecov-commenter avatar Dec 05 '23 07:12 codecov-commenter

This is a breaking change, so, imho, you need to add a flag that will make the new behavior explicit (maybe it could be a provider-level flag). The path argument is always used as-is, so when you set the file extension, then it will be set in UI as well, and the second extension will be added only during the export command.

We also need the doc that documents that behavior.

Thanks Alex, yeah I just tried os.listdir('/Workspace') and it indeed only returns .py when extension is specified in path; as opposed to .py.py; so the double extension is limited to export only in GUI.

Ideally then, this should be fixed in the export API rather than playing with the path. For now though, I can add a flag and call it trim_export_extension:

  • defaulting to false which makes it use path as-is
  • setting to true will make it trim extension and prevent double extension on export

I think it would be sufficient to just have a resource level flag for now rather than provider level; cause this behaviour is not seen for stuff created with workspace_files resource.

And yes, I'll add documentation as well to capture behaviour

crankswagon avatar Dec 05 '23 09:12 crankswagon

@alexott updated as per CR but flag set to resource level 👍

crankswagon avatar Dec 05 '23 11:12 crankswagon

hi @alexott do you have any other reservations for merge?

This might seem like a very round about way to fix the export bug .py.py but I'll explain briefly why we want it (even when not using GUI to export source file).

background

  • Our data science workloads are written as notebooks. Source of truth is a git repo. Notebooks are stored as .py in git. Job tasks sometimes call %run somenote_book.py.
  • Our MLOps pipeline deploys these notebooks as Workspace Notebook with associated Job config into staging and prod environments using terraform. This is done with custom modules calling databricks_notebook resource.
  • Development workflow sometimes require data scientists to pull a git branch into Databricks Workspace Repo and run notebooks from there.

problem

The default behaviour of notebooks in Databricks Workspace and Databricks Workspace Repo differ.

  • Workspace Notebooks names are defined by path unmodifed.
  • Repo Notebooks always ommit extension.

That's the problem for data scientists; they cannot mix and run notebooks created by these 2 different methods.

We could fix it within our custom module by always stripping extension when calling this resource; but IMO it should be fixed upstream in the provider. Please let us know what you think.

crankswagon avatar Dec 11 '23 00:12 crankswagon

We need to discuss about it...

alexott avatar Dec 11 '23 15:12 alexott