databricks-sdk-py icon indicating copy to clipboard operation
databricks-sdk-py copied to clipboard

[ISSUE] Unable to pass notebook params to a sub job(where the job task is a job) from python

Open chaitti opened this issue 2 years ago • 3 comments

Description When we are creating a Job in which there is a task which is a job in itself we use RunJobTask, in this class there are only two members; job_id and job_parameters. There is no way to pass notebook params to this which you can do this from UI.

Reproduction

jobs.Task(
            libraries=[],
            run_job_task=jobs.RunJobTask(job_id=job_id,
                                         job_parameters=[{"key":"1", "value":"2"}]),
            task_key="MainJob",
            timeout_seconds=300, run_if=jobs.RunIf.ALL_SUCCESS,
            description="This is a sub job"

        )

I don't see the "1":"2" params on the UI. It does not set the notebook params.

Expected behavior I should be able to add params on the sub-job inside a main job via python code.

Debug Logs The SDK logs helpful debugging information when debug logging is enabled. Set the log level to debug by adding logging.basicConfig(level=logging.DEBUG) to your program, and include the logs here.

Other Information

  • OS: [e.g. macOS]
  • Version: [e.g. 0.1.0]

Additional context The solution i tried which worked as well is to add that param(notebook_params) in the class. The backend which receives the request already has the functionality to pick this value. Seems like the member was not added to this class.

**********************CODE*************************************************************************
@dataclass
class RunJobTask:
    job_id: int
    job_parameters: Optional[Any] = None
    notebook_params: Optional['Dict[str,str]'] = None

    def as_dict(self) -> dict:
        body = {}
        if self.job_id is not None: body['job_id'] = self.job_id
        if self.job_parameters: body['job_parameters'] = self.job_parameters
        if self.notebook_params: body['notebook_params'] = self.notebook_params
        return body

    @classmethod
    def from_dict(cls, d: Dict[str, any]) -> 'RunJobTask':
        return cls(job_id=d.get('job_id', None), job_parameters=d.get('job_parameters', None),
                   notebook_params=d.get('job_parameters', None))
**********************CODE*************************************************************************

chaitti avatar Aug 10 '23 15:08 chaitti

jobs.Task(
            libraries=[],
            run_job_task=jobs.RunJobTask(job_id=job_id,
                                         notebook_params={"1":"2"}),
            task_key="MainJob",
            timeout_seconds=300, run_if=jobs.RunIf.ALL_SUCCESS,
            description="This is a sub job"

        )

This works post the sdk code change

chaitti avatar Aug 10 '23 15:08 chaitti

@chaitti, thanks for reporting this issue! We are deprecating notebook_params and other task-specific parameters in the future and introducing job parameters soon. We do not officially support notebook_params for the "Run Job" task, but as you've noticed, we allow them in the UI and through the API until job parameters become generally available.

Job parameters passed to the Python SDK will soon work as you'd expect them and notebooks will automatically pick up the parameters ({"1":"2"} in your example) without any changes necessary to the SDK.

gaborratky-db avatar Aug 17 '23 15:08 gaborratky-db

Hi @chaitti @gaborratky-db , I'm trying to implement similar to you're case but unable to figure out Scenario creating a main job and that will excute the task based on success on parallel with the same notebook but different. can you help me with the sample of code. Below is the lineage/task dependency for better understanding.

Main -> subtask1-> subtask4 Main -> subtask2-> subtask4 Main -> subtask3-> subtask4

here subtask will be called same notebook but it will pass the different parameter based on code output.

Could you help me with the code

shivatharun avatar Jan 11 '24 11:01 shivatharun