yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] evaluator_config={"average":None} not working in mlflow.evaluate() for multiclass classification

Open RRRen94 opened this issue 1 year ago • 7 comments

Issues Policy acknowledgement

  • [X] I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

2.9.2

System information

databricks

Describe the problem

For multiclass classification problem, if I use evaluator_config = {"average": None} to set the averaging method to use when computing classification metrics, I will get a list of float [0.7, 0.8, 0.6] as my metrics for each class. But this metric list can not pass validation check, as it expect only numerical values as metrics not a list. https://github.com/mlflow/mlflow/blob/c43823750bffa5b6abcc086683b15a068513b67b/mlflow/utils/validation.py#L137

Tracking information

No response

Code to reproduce issue

with mlflow.start_run() as run:
    # Evaluate the static dataset without providing a model
    result = mlflow.evaluate(
        data=eval_data,
        targets="label",
        predictions="predictions",
        model_type="classifier",
        evaluator_config={"average": None}

Stack trace

Traceback (most recent call last):
...
  File "c:\TEMP\...\evaluate.py", line 80, in task
    mlflow.evaluate(
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\base.py", line 1878, in evaluate
    evaluate_result = _evaluate(
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\base.py", line 1120, in _evaluate
    eval_result = evaluator.evaluate(
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\default_evaluator.py", line 1826, in evaluate
    evaluation_result = self._evaluate(model, is_baseline_model=False)
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\default_evaluator.py", line 1744, in _evaluate
    self._log_metrics()
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\default_evaluator.py", line 712, in _log_metrics
    self.client.log_batch(
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\tracking\client.py", line 1086, in log_batch
    return self._tracking_client.log_batch(
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 459, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\store\tracking\file_store.py", line 1040, in log_batch
    _validate_batch_log_data(metrics, params, tags)
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\utils\validation.py", line 317, in _validate_batch_log_data
    _validate_metric(metric.key, metric.value, metric.timestamp, metric.step)
  File "C:\TEMP\...\.venv\lib\site-packages\mlflow\utils\validation.py", line 146, in _validate_metric
    raise MlflowException(
mlflow.exceptions.MlflowException: Got invalid value [0.78723404 0.13793103 0.08333333 0.        ] for metric 'recall_score' (timestamp=1706264841975). Please specify value as a valid double (64-bit floating point)

Other info / logs

No response

What component(s) does this bug affect?

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [X] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [X] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

RRRen94 avatar Jan 25 '24 14:01 RRRen94

@RRRen94 Could you paste the full stacktrace where it triggers _validate_metric?

serena-ruan avatar Jan 26 '24 02:01 serena-ruan

@serena-ruan Hey, full stacktrace is now updated.

RRRen94 avatar Jan 26 '24 10:01 RRRen94

@RRRen94 I think the original design is to use weighted if "average" method is not set (or None), so to fix the bug I would set "average" to "weighted" if it's None. But for your use case, do you want to log the metric value of each class specifically? Otherwise the workaround is just to remove evaluator_config when calling .evaluate

serena-ruan avatar Jan 29 '24 03:01 serena-ruan

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

github-actions[bot] avatar Feb 02 '24 00:02 github-actions[bot]

@RRRen94 I think the original design is to use weighted if "average" method is not set (or None), so to fix the bug I would set "average" to "weighted" if it's None. But for your use case, do you want to log the metric value of each class specifically? Otherwise the workaround is just to remove evaluator_config when calling .evaluate

Yes, I want to directly see the metrics value of each class specifically, not weighted or averaged. These single values of each class can be found now in per_class_metrics.csv as artifact. But it could be nice to also have the option to log not averaged values as metrics, for example by using evaluator_config={"average": None}.

RRRen94 avatar Feb 02 '24 08:02 RRRen94

Then I think the best effort would be log each class's metric value separately as now mlflow metric doesn't support lists.

serena-ruan avatar Feb 06 '24 07:02 serena-ruan

cc @prithvikannan WDYT?

serena-ruan avatar Feb 06 '24 07:02 serena-ruan