jupyter-scheduler
jupyter-scheduler copied to clipboard
Log notebooks with MLFlow
Prototype of MLFlow - Scheduler integration:
- Add an option to "Log with MLFlow" during job / job definition creation. When activated, input notebooks "Run now" / single jobs are logged as single experiment with a single run, "Scheduled notebooks" / job definitions are logged as an experiment with a run for every job
- Modify models, scheduler, and executors logic to accommodate such logging
- Add "Open in MLFlow" button to detail view that opens a run (for a job) or experiment (for a job definition) in MLFLow Tracker UI where logged artifacts (input notebook, outputs) can be previewed
- Notebook cells tagged with
mlflow_log(can be done via metadata editor) are logged as separate artifacts of the appropriate format (image, pdf, text, html, markdown) when notebook is ran with MLFlow logging enabled.mlflow_logtag logs both input (cell content) and output (result of running the cell),mlflow_log_inputtag logs input only,mlflow_log_outputtag logs output only. Note that MLFlow UI can only preview image, pdf, text, and html files.
To install and use, clone this PR/andrii-i:mlflow branch and follow the development install steps form the jupyter-scheduler readthedocs.
Beyond the scope of this prototype:
- MLFlow could be used to replace existing Job files management functionality of the Scheduler
- Possibility to make MLFlow be able to run notebooks: MLFlow has python_function abstraction that tells it how to run models (essentially, bundles of files). This could be potentially leveraged to package notebooks and dependencies as models and run them directly in MLFlow via python_function abstraction.