jupyter-scheduler Log notebooks with MLFlow

Log notebooks with MLFlow

Open andrii-i opened this issue 1 year ago • 0 comments

Prototype of MLFlow - Scheduler integration:

Add an option to "Log with MLFlow" during job / job definition creation. When activated, input notebooks "Run now" / single jobs are logged as single experiment with a single run, "Scheduled notebooks" / job definitions are logged as an experiment with a run for every job
Modify models, scheduler, and executors logic to accommodate such logging
Add "Open in MLFlow" button to detail view that opens a run (for a job) or experiment (for a job definition) in MLFLow Tracker UI where logged artifacts (input notebook, outputs) can be previewed
Notebook cells tagged with mlflow_log (can be done via metadata editor) are logged as separate artifacts of the appropriate format (image, pdf, text, html, markdown) when notebook is ran with MLFlow logging enabled. mlflow_log tag logs both input (cell content) and output (result of running the cell), mlflow_log_input tag logs input only, mlflow_log_output tag logs output only. Note that MLFlow UI can only preview image, pdf, text, and html files.

To install and use, clone this PR/andrii-i:mlflow branch and follow the development install steps form the jupyter-scheduler readthedocs.

Beyond the scope of this prototype:

MLFlow could be used to replace existing Job files management functionality of the Scheduler
Possibility to make MLFlow be able to run notebooks: MLFlow has python_function abstraction that tells it how to run models (essentially, bundles of files). This could be potentially leveraged to package notebooks and dependencies as models and run them directly in MLFlow via python_function abstraction.

Screenshot 2024-02-27 at 10 09 15 AM copy Screenshot 2024-03-05 at 10 50 51 AM Screenshot 2024-03-05 at 11 02 28 AM

Feb 27 '24 09:02 andrii-i