openml-python Take cost metrics into account

when running scikit-learn benchmarks.

Sep 19 '18 09:09 mfeurer

Hey @mfeurer, can you give me some details about this issue? So that I can work on this issue.

Mar 27 '23 18:03 v-parmar

Hey, @v-parmar I just had a look in what I could suggest you, and found that the following would be a good approach to get started:

Tackle issue #1245 to allow going through all tasks on OpenML without having to download them.

Find a task with an associated cost matrix, for example via

In [8]: for task_id in openml.tasks.list_tasks(task_type=openml.tasks.TaskType.SUPERVISED_CLASSIFICATION):
 ...:     try:
 ...:         task = openml.tasks.get_task(task_id, download_data=False, download_qualities=False)
 ...:     except:
 ...:         continue
 ...:     if task.cost_matrix is not None:
 ...:         print(task_id, task.cost_matrix)

Integrate the cost matrix in the calculation of the metric, where the are often called sample_weights as in this metric.

Apr 17 '23 08:04 mfeurer

I don't think there is standard right now for how the cost matrix should be formatted. Some bad examples include “yes”, “adam”, or “1”. Perhaps we should wait with tackling this until we have a specified format?

Apr 24 '23 11:04 PGijsbers

Could you please put this on the roadmap then? This seems like a basic thing that OpenML should handle.

Apr 25 '23 14:04 mfeurer