yocto-gl [FR] Define Metric Threshold for Model Validation

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

Hi everyone; we are working on designing different options for users to define validation criteria (threshold for metric to satisfy) and pass it to MLflow.evaluate API for model validation; we are proposing following three options. We would love to hear from you which option is more user friendly to our data scientist/MLE users.

Motivation

What is the use case for this feature?

Allow users to define metric thresholds that are required for adding model validation functionality to mlflow.evaluate API.

Why is this use case valuable to support for MLflow users in general?

Allow users to perform validation check after developing and training a model; also can be used as a pre-deployment check if their ML Production Pipeline.

Why is this use case valuable to support for your project(s) or organization?

N/A

Why is it currently difficult to achieve this use case?

N/A

Details

Option 1: Provide users with a class called MetricThreshold to define validation threshold for each metric. Users will call MLflow.evaluate by passing an array of MetricThreshold instances

class MetricThreshold:
  """
  :param name: Name of the metric
  :param lower_bound: (Optional) Lower bound of the metric value
  :param upper_bound: (Optional) Upper bound of the metric value
  :param min_improvement: (Optional) minimum improvement threshold compared to baseline model
  :param min_relative_improvement: (Optional) minimum relative improvement threshold compared to baseline model
  :return: MetricThreshold
  """
# Example Usage
MLflow.evaluate(
  ...,
  validation_criteria = [
    MetricThreshold(name="log_loss", uppoer_bound=0.3, min_improvement=0.01,      min_relative_improvement=0.02),
    MetricThreshold(name="f1_score", lower_bound=0.8, min_improvement=0.01,      min_relative_improvement=0.05),
]
)

Option 2: Let users define validation threshold in dictionary and pass an array of dictionary to MLflow.evaluate.

# Example

{
      "metric_name": "log_loss",
      "lower_bound": 0.5,
      "upper_bound": 0.9,
      "min_abs_change": 0.05,
      "min_relative_change": 0.1,
}
# Example Usage
MLflow.evaluate(
  ...,
  validation_criteria = [
    {"metric_name": "log_loss",
      "lower_bound": 0.5,
      "upper_bound": 0.9,
      "min_abs_change": 0.05,
      "min_relative_change": 0.1},
    {"metric_name": "f1_score",
      "lower_bound": 0.5,
      "upper_bound": 0.9,
      "min_abs_change": 0.05,
      "min_relative_change": 0.1}
  ]
)

Option 3: Let users define thresholds as SQL expression strings.

# Example
MLflow.evaluate(
  ...,
  validation_criteria = [
    "log_loss >= 0.7",
    "log_loss <= baseline - 0.01",
    "log_loss <= baseline * 0.98"
  ]
)

We want to know from user perspective, which option is more convenient; less learning, more intuitive. Really appreciate your feedback!

What component(s) does this bug affect?

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[X] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Jul 21 '22 20:07 zhe-db

@BenWilson2 @dbczumar @harupy @WeichenXu123 Please assign a maintainer and start triaging this issue.

Jul 29 '22 00:07 mlflow-automation

Closing this out now that it's been implemented. Thanks @zhe-db !

Sep 29 '22 00:09 dbczumar

yocto-gl yocto-gl copied to clipboard

[FR] Define Metric Threshold for Model Validation

Willingness to contribute

Proposal Summary

Motivation

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

Option 1: Provide users with a class called MetricThreshold to define validation threshold for each metric. Users will call MLflow.evaluate by passing an array of MetricThreshold instances

Option 2: Let users define validation threshold in dictionary and pass an array of dictionary to MLflow.evaluate.

Option 3: Let users define thresholds as SQL expression strings.

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

yocto-gl
yocto-gl copied to clipboard