databricks-sdk-py
databricks-sdk-py copied to clipboard
[ISSUE] databricks sdk depends_on =['task1'] getting an error attributeerror: str has no attribute 'as_dict'
Below is the code its breaking near depends_on =['task1'] which is task2 dependent on completion of task1 .
error: attribute error : str object has no attribute as_dict. Please correct me for to include dependency between tasks
import os
import time
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs
w = WorkspaceClient()
notebook_path = f'/Users/{w.current_user.me().user_name}/sdk-{time.time_ns()}'
cluster_id = w.clusters.ensure_cluster_is_running(
os.environ["DATABRICKS_CLUSTER_ID"]) and os.environ["DATABRICKS_CLUSTER_ID"]
created_job = w.jobs.create(name=f'sdk-{time.time_ns()}',
tasks=[
jobs.Task(description="test",
existing_cluster_id=cluster_id,
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
task_key="task1",
timeout_seconds=0),
jobs.Task(description="test",
existing_cluster_id=cluster_id,
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
task_key="task2",
depends_on=['task1']
timeout_seconds=0)
])
Hi @shivatharun. The depends_on field for jobs.Task is expecting a list of type TaskDependency (Docs)
jobs.TaskDependency(task_key="task1")
Can you give that a try?:
import time
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs
w = WorkspaceClient()
notebook_path = f"/Users/{w.current_user.me().user_name}/sdk-{time.time_ns()}"
# waiting for the cluster to start
w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
cluster_id = os.environ["DATABRICKS_CLUSTER_ID"]
first_task = jobs.Task(
description="test",
existing_cluster_id=cluster_id,
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
task_key="task1",
timeout_seconds=0,
)
second_task = jobs.Task(
description="test",
existing_cluster_id=cluster_id,
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
task_key="task2",
depends_on=[jobs.TaskDependency(task_key="task1")],
timeout_seconds=0,
)
created_job = w.jobs.create(
name=f"sdk-{time.time_ns()}", tasks=[first_task, second_task])
Hi @shivatharun, thanks for reaching out. Can you please tell if the solution proposed by @kimberlyma is working?
Hi @tanmay-db @mgyucht ,
Actullay there is no TaskDependency method , even I upgraded the version of databricks sdk instead showing /referencing TaskDependenciesItem however if pass
jobs.TaskDependenciesItem(task_key="task1")
then returning attributeerror : module 'databricks.sdk.service.jobs' has no attribute TaskDependenciesItem
Hi @shivatharun - to clarify you upgraded the SDK in your development environment? Are you getting the error in the same environment? Can you verify your version of the sdk pip show databricks-sdk. pip install databricks-sdk --upgrade will get you the latest. It should be TaskDependency instead of TaskDependenciesItem since the release for v.0.1.12 #205
Thanks for the solution provided. It worked for me smoothly !!
Thanks for the solution, it works for single dependency. How to specify multiple tasks dependencies and specify options for "Run if dependencies"