kedro-mlflow icon indicating copy to clipboard operation
kedro-mlflow copied to clipboard

Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage

Open hugocool opened this issue 6 months ago • 0 comments

Description

Seeking clarification and examples for the use of MlflowModelRegistryDataSet within the Kedro-Mlflow integration for logging and managing models in MLflow’s model registry. Specifically, I need clarification on how to save a model to a specific version or state directly (e.g., staging) and how to retrieve a specific version of a model, like 'staging version 6'. The documentation provides parameters but lacks practical examples, especially for scenarios like logging a model directly to a specific stage like 'staging'.

Context

This change is crucial for efficiently managing model versions and stages using the Kedro-Mlflow integration. The ability to directly save and retrieve specific model versions and stages would streamline the workflow and enhance the overall usability of the integration. This functionality would not only benefit my current projects but also provide a clearer path for other users working with model versioning and staging in MLflow. My journey began with successfully implementing MlflowModelLoggerDataSet as per the documentation. However, confusion arose with the MlflowModelRegistryDataSet. My initial setup was:

my_transformer_model:
  type: kedro_mlflow.io.models.MlflowModelRegistryDataSet
  flavor: mlflow.transformers
  model_name: my_transformer_model_name
  stage_or_version: staging

This configuration led to a DatasetError when trying to save a model, indicating the absence of a 'save' method for MlflowModelRegistryDataSet. The documentation, while detailing parameters, falls short in providing practical examples for saving and registering models.

Workaround

A solution I found for logging models involved using MlflowModelLoggerDataSet, but this does not directly address the issue of staging/versioning through the API or retrieving specific versions:

my_transformer_model:
    type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
    flavor: mlflow.transformers
    save_args:
        registered_model_name: "my_transformer_model_name"

This method effectively facilitated saving and loading the model in MLflow, but it's not documented as such. However, this method lacks direct control over versioning/staging and does not offer a clear path for retrieving specific versions. Only the MlflowModelRegistryDataSet allow one to load such named models.

Specific Concerns and Clarifications Needed

  • Loading Specific Versions: While kedro_mlflow.io.models.MlflowModelRegistryDataSet is necessary for loading specific versions of a model, the process for saving a model to a specific version or state (e.g., logging a model directly to staging) is unclear.
  • Retrieving Specific Model Versions: The methodology for retrieving a specific version of a model, such as 'staging version 6', is not clearly documented.
  • Direct Versioning/Staging Through API: Guidance is needed on how to stage or version a model directly through the API, as opposed to using the MLflow UI.
  • Viewing Associated Metrics: Instructions on how to view associated metrics with the model training run in the MLflow model UI are needed to effectively promote the best model to staging.

Possible Implementation

  • Update the documentation to include explicit examples of using MlflowModelRegistryDataSet for saving models directly to a specific version or stage (e.g., staging).

  • Provide examples for retrieving specific model versions, such as how to fetch 'staging version 6' of a model.

  • An example implementation might look something like this in the catalog.yml:

    my_model:
      type: kedro_mlflow.io.models.MlflowModelRegistryDataSet
      flavor: mlflow.sklearn
      model_name: my_model_name
      stage_or_version: "staging:6"  # How to specify direct logging to this stage?
    

hugocool avatar Dec 13 '23 12:12 hugocool