dbx icon indicating copy to clipboard operation
dbx copied to clipboard

dbx launch --parameters option expects whole workflow definition instead of just parameters

Open pspeter opened this issue 3 years ago • 2 comments

Expected Behavior

To run a job with custom parameters, the dbx launch help shows the following example (yes, it shows dbx execute in the dbx launch help, but that's not the main issue): dbx execute <workflow_name> --parameters='[{"task_key": "some", "named_parameters": ["--a=1", "--b=2"]}]'

Current Behavior

Running this example, dbx crashes:

[...]
 c:\project\venv\lib\site-packages\dbx\api\launch\runners.py:24 in __init__                        │
│                                                                                                  │
│    21 │   │   self.job = job                                                                     │
│    22 │   │   self.api_client = api_client                                                       │
│    23 │   │   self.environment = environment                                                     │
│ ❱  24 │   │   self._parameters = None if not parameters else self._process_parameters(paramete   │
│    25 │                                                                                          │
│    26 │   def _process_parameters(self, payload: str) -> Union[RunSubmitV2d0ParamInfo, RunSubm   │
│    27 │   │   _payload = json.loads(payload)                                                     │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │        api_client = <databricks_cli.sdk.api_client.ApiClient object at 0x000002027203F040>   │ │
│ │ deployment_run_id = 'b7443fffb12944e0aa8cb83e4ba859e6'                                       │ │
│ │       environment = 'default'                                                                │ │
│ │               job = 'radsatz-lifecycle-integration-test'                                     │ │
│ │        parameters = '[{"task_key": "main", "parameters": ["file:fuse://tests/integration",   │ │
│ │                     "--db=dev"'+3                                                            │ │
│ │              self = <dbx.api.launch.runners.RunSubmitLauncher object at 0x00000202722AF2B0>  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ c:\project\venv\lib\site-packages\dbx\api\launch\ru                                              │ 
│ nners.py:30 in _process_parameters                                                               │
│                                                                                                  │
│    27 │   │   _payload = json.loads(payload)                                                     │
│    28 │   │                                                                                      │
│    29 │   │   if self.api_client.jobs_api_version == "2.1":                                      │
│ ❱  30 │   │   │   return RunSubmitV2d1ParamInfo(**_payload)                                      │
│    31 │   │   else:                                                                              │
│    32 │   │   │   return RunSubmitV2d0ParamInfo(**_payload)                                      │
│    33                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ _payload = [                                                                                 │ │
│ │            │   {                                                                             │ │
│ │            │   │   'task_key': 'main',                                                       │ │
│ │            │   │   'parameters': ['file:fuse://tests/integration', '--db=dev']               │ │
│ │            │   }                                                                             │ │
│ │            ]                                                                                 │ │
│ │  payload = '[{"task_key": "main", "parameters": ["file:fuse://tests/integration",            │ │
│ │            "--db=dev"'+3                                                                     │ │
│ │     self = <dbx.api.launch.runners.RunSubmitLauncher object at 0x00000202722AF2B0>           │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: ModelMetaclass object argument after ** must be a mapping, not list

Even if you just provide parameters for a single task, it does not work, as it expects the whole workflow definition including new_cluster.

Steps to Reproduce (for bugs)

  1. Open cookie cutter project named 'project-name'
  2. dbx launch project-name-sample-etl --from-assets --trace --parameters='[{"task_key": "main", "parameters": ["--conf-file", "file:fuse://conf/tasks/sample_etl_config.yml"]}]'

Context

Only dbx launch is affected by this, dbx execute works as expected in this regard. It's the same for Job API 2.0, I haven't tried 1.x.

Your Environment

  • dbx version used: 0.7.4
  • Databricks Runtime version: 11.1

pspeter avatar Sep 08 '22 07:09 pspeter

hi @pspeter , there is a verification issue in place, however the logical issue is not there. dbx launch expects the argument payload as described in the Jobs API. The way to provide parameters payload is expected be different for execute and launch (this is by design).

Please refer to the launch docs for examples and to the Jobs API for guidance.

renardeinside avatar Sep 13 '22 17:09 renardeinside

Thanks renardeinside, but I have tried the examples from the docs. These do not work, as I described in the issue above. I suppose the dbx docs are the problem then, specifically the description of the --parameters option to dbx launch.

It would be nice to be able to only change parts of the payload (e.g. only the parameters of a python-task) and leave the rest as it is in the deployment.yml, without having to specify the whole job payload in a json string.

Also, it is confusing that the option is named --parameters but it does not affect just the parameters of a python-task. Maybe the option should be called --job-payload something along those lines.

pspeter avatar Sep 20 '22 12:09 pspeter