kedro-mlflow
kedro-mlflow copied to clipboard
Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage
Description
Seeking clarification and examples for the use of MlflowModelRegistryDataSet
within the Kedro-Mlflow integration for logging and managing models in MLflow’s model registry. Specifically, I need clarification on how to save a model to a specific version or state directly (e.g., staging) and how to retrieve a specific version of a model, like 'staging version 6'. The documentation provides parameters but lacks practical examples, especially for scenarios like logging a model directly to a specific stage like 'staging'.
Context
This change is crucial for efficiently managing model versions and stages using the Kedro-Mlflow integration. The ability to directly save and retrieve specific model versions and stages would streamline the workflow and enhance the overall usability of the integration. This functionality would not only benefit my current projects but also provide a clearer path for other users working with model versioning and staging in MLflow.
My journey began with successfully implementing MlflowModelLoggerDataSet
as per the documentation. However, confusion arose with the MlflowModelRegistryDataSet
. My initial setup was:
my_transformer_model:
type: kedro_mlflow.io.models.MlflowModelRegistryDataSet
flavor: mlflow.transformers
model_name: my_transformer_model_name
stage_or_version: staging
This configuration led to a DatasetError
when trying to save a model, indicating the absence of a 'save' method for MlflowModelRegistryDataSet
. The documentation, while detailing parameters, falls short in providing practical examples for saving and registering models.
Workaround
A solution I found for logging models involved using MlflowModelLoggerDataSet
, but this does not directly address the issue of staging/versioning through the API or retrieving specific versions:
my_transformer_model:
type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
flavor: mlflow.transformers
save_args:
registered_model_name: "my_transformer_model_name"
This method effectively facilitated saving and loading the model in MLflow, but it's not documented as such. However, this method lacks direct control over versioning/staging and does not offer a clear path for retrieving specific versions. Only the MlflowModelRegistryDataSet
allow one to load such named models.
Specific Concerns and Clarifications Needed
-
Loading Specific Versions: While
kedro_mlflow.io.models.MlflowModelRegistryDataSet
is necessary for loading specific versions of a model, the process for saving a model to a specific version or state (e.g., logging a model directly to staging) is unclear. - Retrieving Specific Model Versions: The methodology for retrieving a specific version of a model, such as 'staging version 6', is not clearly documented.
- Direct Versioning/Staging Through API: Guidance is needed on how to stage or version a model directly through the API, as opposed to using the MLflow UI.
- Viewing Associated Metrics: Instructions on how to view associated metrics with the model training run in the MLflow model UI are needed to effectively promote the best model to staging.
Possible Implementation
-
Update the documentation to include explicit examples of using MlflowModelRegistryDataSet for saving models directly to a specific version or stage (e.g., staging).
-
Provide examples for retrieving specific model versions, such as how to fetch 'staging version 6' of a model.
-
An example implementation might look something like this in the catalog.yml:
my_model: type: kedro_mlflow.io.models.MlflowModelRegistryDataSet flavor: mlflow.sklearn model_name: my_model_name stage_or_version: "staging:6" # How to specify direct logging to this stage?