kedro-mlflow
kedro-mlflow copied to clipboard
Allow Running kedro-mlflow projects with an MLflow orchestrator
Description
kedro projects are mostly executed with kedro and kedro-mlflow is responsible in starting a new MLflow run/session with a given configs. There are some scenarios where the kedro project could be executed with some sort of orchestrators, such as MLflow project, or an Airflow pipeline. Theses orchestrators can start themeseleves an MLflow RUN to take the control of the overall session. for example :
- MLflow project that start an MLflow RUN where it put all the execution context before running the kedro project
- An airflow Job that Start Run, execute kedro project, then get the resuts from the RUN to register or deploy the model
Context
We want to use MLflow project so we can run the kedro project from remote repo (for reproductibility) and fit the python environnement alongside with the fitted model (for accurate code dependencies)
This feature can also enable the integration of kedro-mlflow with more upstream tools
Possible Implementation
Maybe we can check here if mlflow have already an active RUN, if it's the case, we can use it when starting the kedro-mlflow run
In such a situation, what is the expected behaviour at the end of the pipeline? Do we expect the run to be closed? The other problem is that if mlflow is not properly configured by the orchestrator, the active run may be located in another tracking_uri than the one specified in the configuration, hence raising a mlflow.exceptions.MlflowException: Run 'xxx' not found
error.
The easiest way to inject behaviour would be to pass the tracking.run.id
to the configuration, but it requires the orchestrator modifying the config...
So the final decision is:
- if an active mlflow run exists, we ignore all configuration in
mlflow.yml
and uses the configuration from environment - the pipeline logs in this active run
- the mlflow run is NOT closed at the end of the kedro run
That looks good to me. It makes sense to delegate the entire session to the entity that created the run in the first place.