yocto-gl
yocto-gl copied to clipboard
[BUG] evaluator_config={"average":None} not working in mlflow.evaluate() for multiclass classification
Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the issues policy
Where did you encounter this bug?
Local machine
Willingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
2.9.2
System information
databricks
Describe the problem
For multiclass classification problem, if I use evaluator_config = {"average": None}
to set the averaging method to use when computing classification metrics, I will get a list of float [0.7, 0.8, 0.6]
as my metrics for each class. But this metric list can not pass validation check, as it expect only numerical values as metrics not a list.
https://github.com/mlflow/mlflow/blob/c43823750bffa5b6abcc086683b15a068513b67b/mlflow/utils/validation.py#L137
Tracking information
No response
Code to reproduce issue
with mlflow.start_run() as run:
# Evaluate the static dataset without providing a model
result = mlflow.evaluate(
data=eval_data,
targets="label",
predictions="predictions",
model_type="classifier",
evaluator_config={"average": None}
Stack trace
Traceback (most recent call last):
...
File "c:\TEMP\...\evaluate.py", line 80, in task
mlflow.evaluate(
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\base.py", line 1878, in evaluate
evaluate_result = _evaluate(
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\base.py", line 1120, in _evaluate
eval_result = evaluator.evaluate(
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\default_evaluator.py", line 1826, in evaluate
evaluation_result = self._evaluate(model, is_baseline_model=False)
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\default_evaluator.py", line 1744, in _evaluate
self._log_metrics()
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\models\evaluation\default_evaluator.py", line 712, in _log_metrics
self.client.log_batch(
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\tracking\client.py", line 1086, in log_batch
return self._tracking_client.log_batch(
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 459, in log_batch
self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\store\tracking\file_store.py", line 1040, in log_batch
_validate_batch_log_data(metrics, params, tags)
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\utils\validation.py", line 317, in _validate_batch_log_data
_validate_metric(metric.key, metric.value, metric.timestamp, metric.step)
File "C:\TEMP\...\.venv\lib\site-packages\mlflow\utils\validation.py", line 146, in _validate_metric
raise MlflowException(
mlflow.exceptions.MlflowException: Got invalid value [0.78723404 0.13793103 0.08333333 0. ] for metric 'recall_score' (timestamp=1706264841975). Please specify value as a valid double (64-bit floating point)
Other info / logs
No response
What component(s) does this bug affect?
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [X]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [X]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
@RRRen94 Could you paste the full stacktrace where it triggers _validate_metric?
@serena-ruan Hey, full stacktrace is now updated.
@RRRen94 I think the original design is to use weighted
if "average" method is not set (or None), so to fix the bug I would set "average" to "weighted" if it's None. But for your use case, do you want to log the metric value of each class specifically? Otherwise the workaround is just to remove evaluator_config when calling .evaluate
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
@RRRen94 I think the original design is to use
weighted
if "average" method is not set (or None), so to fix the bug I would set "average" to "weighted" if it's None. But for your use case, do you want to log the metric value of each class specifically? Otherwise the workaround is just to remove evaluator_config when calling.evaluate
Yes, I want to directly see the metrics value of each class specifically, not weighted or averaged. These single values of each class can be found now in per_class_metrics.csv
as artifact. But it could be nice to also have the option to log not averaged values as metrics, for example by using evaluator_config={"average": None}
.
Then I think the best effort would be log each class's metric value separately as now mlflow metric doesn't support lists.
cc @prithvikannan WDYT?