dbx icon indicating copy to clipboard operation
dbx copied to clipboard

`dbx deploy`: `named_parameters` dictionary gets an extraneous `"named_parameters"` entry when renaming `file:` paths

Open angus-dawson-idexx opened this issue 2 years ago • 1 comments
trafficstars

Expected Behavior

A deployment with dbx deploy [...] --write-specs-to-file=spec.json with a named_parameters definition like so:

named_parameters:
  conf-file: "file:fuse://path/to/some-file.yaml"

Should result in a specs file with the following named_parameters object:

"named_parameters": {
  "conf-file": "/dbfs/[...]/artifacts/path/to/some-file.yaml",
}

And only conf-file should be created as a keyword argument parameter in the workflow task on the web GUI.

Current Behavior

Instead, I get this:

"named_parameters": {
  "conf-file": "/dbfs/[...]/artifacts/path/to/some-file.yaml",
  "named_parameters": "/dbfs/[...]/artifacts/path/to/some-file.yaml"
}

And the workflow task on the web GUI ends up with two keyword argument parameters: conf-file and named_parameters.

Steps to Reproduce (for bugs)

Run a dbx deploy with a workflow that has python_wheel_task tasks with named_parameters, at least one of which has a value that starts with file:// or file:fuse://. Then check the keyword argument parameters of the task in the web GUI.

Context

I've traced the problem to this function: https://github.com/databrickslabs/dbx/blob/34bd186956915d70792621faa1a8bc38f38a9d41/dbx/api/adjuster/adjuster.py#L165-L169

And this part in PropertyAdjuster.traverse(): https://github.com/databrickslabs/dbx/blob/34bd186956915d70792621faa1a8bc38f38a9d41/dbx/api/adjuster/adjuster.py#L43-L48

After yielding the correct tuple for conf-file:

('file:fuse://path/to/config.yaml', {'conf-file': 'file:fuse://path/to/config.yaml'}, 'conf-file')

It then attempts to traverse item with index_in_parent, which is named_parameters, and since item is a string, traverse jumps here and terminates: https://github.com/databrickslabs/dbx/blob/34bd186956915d70792621faa1a8bc38f38a9d41/dbx/api/adjuster/adjuster.py#L66

And yields essentially a duplicate tuple except with the wrong index_in_parent:

('file:fuse://path/to/config.yaml', {'conf-file': 'file:fuse://path/to/config.yaml'}, 'named_parameters')

Your Environment

  • dbx version used: 0.8.18
  • Databricks Runtime version: 12.2.x-scala2.12

angus-dawson-idexx avatar Nov 09 '23 18:11 angus-dawson-idexx

If the call in line 47 is changed to self.traverse(item, _object, key), you still get a duplicate tuple, but at least the semantics of index_in_parent in that generator are correct and you just end up setting the same property twice, which is fine. I'm not sure if that ends up breaking other traversal cases though, or if there's a better fix to avoid visiting that element twice in the first place.

angus-dawson-idexx avatar Nov 09 '23 18:11 angus-dawson-idexx