yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

Support to detect Custom Evaluator in pyspark for auto logging

Open mayuri-kulkarni opened this issue 10 months ago • 2 comments

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

Objective: Extend PySpark's auto logging to detect and log metrics from custom evaluators, enabling seamless integration of user-defined evaluation logic.

Key Features:

Custom Evaluator Detection: Automatically detect usage of custom evaluators in machine learning pipelines. Metric Logging: Capture evaluation metrics from custom evaluators alongside standard metrics. Compatibility: Ensure compatibility with existing PySpark APIs and evaluation frameworks. Flexibility: Support various custom evaluation logic, metrics, and domain-specific criteria. Documentation and Examples: Provide clear documentation and examples for using custom evaluators with auto logging.

Motivation

What is the use case for this feature?

To be able to log metrics from custom evaluator class when working with pyspark ml without much hassle

Why is this use case valuable to support for MLflow users in general?

Use of custom evaluator in pyspark ml is very co

Why is this use case valuable to support for your project(s) or organization?

Because we use custom evaluator and ml flow is not able to recognize it, thus requires manual logging

Why is it currently difficult to achieve this use case?

At current stage it requires manual logging. Since autologging does everything so well, we should not need to do this manually.

Details

No response

What component(s) does this bug affect?

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [X] area/build: Build and test infrastructure for MLflow
  • [ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

mayuri-kulkarni avatar Apr 14 '24 11:04 mayuri-kulkarni

@mayuri-kulkarni Could you explain what custom evaluator is not supported? We automatically patch .evaluate function for evaluators https://github.com/mlflow/mlflow/blob/7e383afe9609bc43b599b17233e24b96be9d7859/mlflow/pyspark/ml/init.py#L1235-L1241

serena-ruan avatar Apr 16 '24 09:04 serena-ruan

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

github-actions[bot] avatar Apr 22 '24 00:04 github-actions[bot]