jupyter-scheduler
jupyter-scheduler copied to clipboard
Multi-task jobs
Problem
No support for multi-task jobs
Proposed Solution
Provide ability to run multiple tasks as one job, expressed as a DAG. This task depends on Dask/Ray backend implementation (#410).
Since the other issue was closed as a duplicate, I'm porting some of the discussion over here. cc @akshaychitneni
There is some helpful REST API design—borrowing from Elyra's similar functionality—that I don't think we should lose here.
Akshay has been thinking a lot about this topic recently. Let's use this thread to collaborate on this feature.
Problem
Jupyter scheduler currently enable users to create and manage background jobs that execute a notebook file. We would like to extend current jobs to support multiple notebook tasks where each task would execute a notebook file and also allow creating dependencies between the tasks. We want to intiate the discussion to extend jupyter scheduler so users can create and manage notebooks workflows and associated runs in jupyter workspace. It would also require UX for users to easily create tasks and it associated dependencies using a DAG editor.
Proposed Solution
Tentative model:
- DescribeJobDefinition API Response
{ "name": "test1", "tags": null, "output_filename_template": "{{input_filename}}-{{create_time}}", "schedule": "0 0 * * MON-FRI", "timezone": "America/Los_Angeles", "job_definition_id": "b5c6099c-9bec-4a04-968f-37e9c23c0f9b", "create_time": 1701124758318, "update_time": 1701124758317, "active": true, "tasks": [ { "name": "task1", "input_filename": "Untitled1.ipynb", "parameters": null, "runtimeProperties": {}, "runtime_environment_name": "anaconda3", "runtime_environment_parameters": null, "output_formats": [ "ipynb", "html" ], "compute_type": null, "trigger_rule": null, "dependsOn": [] }, { "name": "task2", "input_filename": "Untitled2.ipynb", "parameters": null, "runtimeProperties": {}, "runtime_environment_name": "anaconda3", "runtime_environment_parameters": null, "output_formats": [ "ipynb", "html" ], "compute_type": null, "trigger_rule": "all_success", "dependsOn": ["task1"] } }DescribeJob API Response:
{ "name": "job1", "tags": null, "output_filename_template": "{{input_filename}}-{{create_time}}", "job_id": "27d8a6ae-47d0-4ed3-9e28-5411d21a0e03", "url": "/jobs/27d8a6ae-47d0-4ed3-9e28-5411d21a0e03", "create_time": 1696264966089, "update_time": 1696264968551, "start_time": 1696264967241, "end_time": 1696264968550, "status": "COMPLETED", "status_message": null, "tasks": [ { "input_filename": "Untitled2.ipynb", "runtime_environment_name": "anaconda3", "runtime_environment_parameters": null, "output_formats": [ "ipynb", "html" ], "parameters": null, "name": "task1", "job_files": [ { "display_name": "HTML", "file_format": "html", "file_path": null }, { "display_name": "Input", "file_format": "input", "file_path": null } ], "create_time": 1696264966089, "update_time": 1696264968551, "start_time": 1696264967241, "end_time": 1696264968550, "trigger_rule": null, "dependsOn": [], "status": "COMPLETED", "status_message": null, "downloaded": false }, { "input_filename": "Untitled2.ipynb", "runtime_environment_name": "anaconda3", "runtime_environment_parameters": null, "output_formats": [ "ipynb", "html" ], "parameters": null, "name": "task2", "job_files": [ { "display_name": "HTML", "file_format": "html", "file_path": null }, { "display_name": "Input", "file_format": "input", "file_path": null } ], "create_time": 1696264966089, "update_time": 1696264968551, "start_time": 1696264967241, "end_time": 1696264968550, "trigger_rule": "all_success", "dependsOn": ["task1"], "status": "COMPLETED", "status_message": null, "downloaded": false } ] }Providing such an interface would allow users to extend scheduler to integrate with external orchestrators or schedulers like airflow to schedule and run notebook DAGs.
Additional context
- Elyra provides similar functionality with a DAG editor - https://elyra.readthedocs.io/en/latest/user_guide/pipelines.html#generic-pipelines
@3coins @JasonWeill @andrii-i Would you be able to attend the jupyter server meeting this week or next to start discussion on this work? I work with @Zsailer on the same team and would like to start collaborating with you all on this feature. Thanks
@akshaychitneni I will join this week, let’s discuss more. Before we start this task, it might be useful to move to a Dask based backend for the Scheduler, which will make running workflows much simpler.
Woohoo 🎉 I'm looking forwarding to hanging out with all of you cool people in the Jupyter Server meeting.
I will join the meeting as well.
Hi all, hinted at by #517, our team has been developing UX around scheduling a DAG of notebooks using Jupyter-scheduler as the starting place. We've already discussed this with your team, but wanted to share a "sneak peak" preview of the feature in the open-source and begin collaborating openly here to get this work upstreamed (assuming folks would benefit).
Here is a video demonstrating the UX we built:
https://github.com/jupyter-server/jupyter-scheduler/assets/2791223/e1337903-1e51-40bf-9aa0-aa2157b85f82
In short, this essentially a re-write of the frontend and would replace the current scheduler UI with a broader use-case editor for scheduling a DAG of notebooks. @sathishlxg led this work and can work with y'all here to make the transition smooth.
We've also made significant changes to the REST API models to handle individual tasks. @akshaychitneni and @nsingl00 let this work, so they will work with you here to make the appropriate changes.
We recognize that this is a pretty disruptive change to the package. We're willing+able to help with the merging, releasing, and long term maintenance of this work.
We discussed hosting a regular meeting to get things open-sourced as soon as possible. We can use this thread (at least to start) to discuss next steps.
Thanks all!
@Zsailer, @sathishlxg, @akshaychitneni, @nsingl00 thank you for open-sourcing your work. I'm excited to work on this with you. Other than scheduling a meeting, a good next step would be to open a PR or a branch with code, even if it would not work as-is. I would be happy to then work to make the necessary changes and integrate the new functionality.
Awesome work, look forward to learning more!