kedro-mlflow icon indicating copy to clipboard operation
kedro-mlflow copied to clipboard

Allow Running kedro-mlflow projects with an MLflow orchestrator

Open takikadiri opened this issue 2 years ago • 1 comments

Description

kedro projects are mostly executed with kedro and kedro-mlflow is responsible in starting a new MLflow run/session with a given configs. There are some scenarios where the kedro project could be executed with some sort of orchestrators, such as MLflow project, or an Airflow pipeline. Theses orchestrators can start themeseleves an MLflow RUN to take the control of the overall session. for example :

  • MLflow project that start an MLflow RUN where it put all the execution context before running the kedro project
  • An airflow Job that Start Run, execute kedro project, then get the resuts from the RUN to register or deploy the model

Context

We want to use MLflow project so we can run the kedro project from remote repo (for reproductibility) and fit the python environnement alongside with the fitted model (for accurate code dependencies)

This feature can also enable the integration of kedro-mlflow with more upstream tools

Possible Implementation

Maybe we can check here if mlflow have already an active RUN, if it's the case, we can use it when starting the kedro-mlflow run

takikadiri avatar Sep 13 '22 19:09 takikadiri

In such a situation, what is the expected behaviour at the end of the pipeline? Do we expect the run to be closed? The other problem is that if mlflow is not properly configured by the orchestrator, the active run may be located in another tracking_uri than the one specified in the configuration, hence raising a mlflow.exceptions.MlflowException: Run 'xxx' not found error.

The easiest way to inject behaviour would be to pass the tracking.run.id to the configuration, but it requires the orchestrator modifying the config...

Galileo-Galilei avatar Sep 20 '22 20:09 Galileo-Galilei

So the final decision is:

  • if an active mlflow run exists, we ignore all configuration in mlflow.yml and uses the configuration from environment
  • the pipeline logs in this active run
  • the mlflow run is NOT closed at the end of the kedro run

Galileo-Galilei avatar Oct 02 '22 21:10 Galileo-Galilei

That looks good to me. It makes sense to delegate the entire session to the entity that created the run in the first place.

takikadiri avatar Jan 08 '23 15:01 takikadiri