openml-python get_metric_fn can't handle classification metrics using probabilities

trafficstars

import openml
import sklearn
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.tree import DecisionTreeClassifier

openml.config.apikey = "KEY"

# build a scikit-learn classifier
clf = sklearn.pipeline.make_pipeline(SimpleImputer(),
                                     DecisionTreeClassifier())

task_id = 189871

task = openml.tasks.get_task(task_id)  # download the OpenML task
run = openml.runs.run_model_on_task(clf, task)  # run the classifier on the task

# ok:
#score = run.get_metric_fn(sklearn.metrics.accuracy_score)  # print accuracy score
# fails, even if pass labels
score = run.get_metric_fn(sklearn.metrics.log_loss, kwargs=dict(labels=task.class_labels))

print('Data set: %s; Accuracy: %0.2f' % (task.get_dataset().name, float(np.mean(score))))

fails with:

ValueError: The number of classes in labels is different from that in y_pred. Classes found in labels: ['0' '1' '2' '3' '4']

The problem is is old (2017) when function first made: https://github.com/openml/openml-python/commit/1c285a803b58dca963e4c51930251ac334d94d19

This block:

https://github.com/openml/openml-python/blob/develop/openml/runs/run.py#L489-L492

gets an index of the prediction, but this is not the correct thing to pass to log_loss.

There are multiple problems:

A classification metric like log_loss should take in probabilities, never labels, but this is impossible to do
The original labels should be what is predicted and passed to any metric, as defined by the task class labels, not the index.

I think the only solution is to avoid the helpers in openml-python and directly manage the splits etc.

Jun 01 '22 03:06 pseudotensor

FYI: https://github.com/openml/openml-python/pull/1140 Just to show how it can be made to work. Not necessarily elegent.

Jun 01 '22 04:06 pseudotensor

In the end I couldn't make this helper work properly. The scores are just wrong, e.g. AUC of 1. Had to use "plain" way of just asking task for splits and doing them myself, like many other papers have done.

E.g. even random forest was giving AUC=1 for KDDCup09_appetency with task_id 75105

Jun 01 '22 15:06 pseudotensor

Does the code you provide above trigger this issue together with #1140?

Jun 07 '22 16:06 mfeurer

Does the code you provide above trigger this issue together with #1140?

The PR is an attempted work-around, but it's incomplete and doesn't really work in general.

As for the original problem, ya the code snippet above shows the problem.

Jun 07 '22 17:06 pseudotensor

add deprecation message for 0.13.x
fix with proposal in #1140

Feb 20 '23 16:02 PGijsbers

openml-python openml-python copied to clipboard

get_metric_fn can't handle classification metrics using probabilities

openml-python
openml-python copied to clipboard