databricks-sdk-py icon indicating copy to clipboard operation
databricks-sdk-py copied to clipboard

[ISSUE] databricks sdk jobs. how to create dependency task /lineage using python

Open shivatharun opened this issue 1 year ago • 2 comments
trafficstars

How to create dependency jobs / lineage using databricks sdk. I found documentation for single job creation.

created_job = w.jobs.create(name=f'sdk-{time.time_ns()}',
                            tasks=[
                                jobs.Task(description="test",
                                          existing_cluster_id=cluster_id,
                                          notebook_task=jobs.NotebookTask(notebook_path="test_run"),
                                          task_key="test",
                                          timeout_seconds=0)

Lets say I have main notebook within the notebook creating a job test and passing "test_run" notebook to trigger. I want to run test_run notebook with different paremeter. How to create lineage using sdk python. ? Could please help to share any references I couldn't find ?

shivatharun avatar Jan 10 '24 20:01 shivatharun

Hi @shivatharun, the lineage isn't supported in the SDK currently, however you could update the job with different parameters for example: https://github.com/databricks/databricks-sdk-py/blob/main/examples/jobs/update_jobs_api_full_integration.py where you could use a different JobSetting, does this seem to work for your use case?

tanmay-db avatar Jan 15 '24 14:01 tanmay-db

Hi @tanmay-db - May I know how tasks can run parallel within job, without any dependency, is there any limitation number ?

created_job = w.jobs.create(name=f'sdk-{time.time_ns()}',
                            tasks=[ task1,task2,task3,........]))

shivatharun avatar Jan 17 '24 13:01 shivatharun