openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

Take cost metrics into account

Open mfeurer opened this issue 7 years ago • 4 comments

when running scikit-learn benchmarks.

mfeurer avatar Sep 19 '18 09:09 mfeurer

Hey @mfeurer, can you give me some details about this issue? So that I can work on this issue.

v-parmar avatar Mar 27 '23 18:03 v-parmar

Hey, @v-parmar I just had a look in what I could suggest you, and found that the following would be a good approach to get started:

  1. Tackle issue #1245 to allow going through all tasks on OpenML without having to download them.
  2. Find a task with an associated cost matrix, for example via
    In [8]: for task_id in openml.tasks.list_tasks(task_type=openml.tasks.TaskType.SUPERVISED_CLASSIFICATION):
     ...:     try:
     ...:         task = openml.tasks.get_task(task_id, download_data=False, download_qualities=False)
     ...:     except:
     ...:         continue
     ...:     if task.cost_matrix is not None:
     ...:         print(task_id, task.cost_matrix)
    
  3. Integrate the cost matrix in the calculation of the metric, where the are often called sample_weights as in this metric.

mfeurer avatar Apr 17 '23 08:04 mfeurer

I don't think there is standard right now for how the cost matrix should be formatted. Some bad examples include “yes”, “adam”, or “1”. Perhaps we should wait with tackling this until we have a specified format?

PGijsbers avatar Apr 24 '23 11:04 PGijsbers

Could you please put this on the roadmap then? This seems like a basic thing that OpenML should handle.

mfeurer avatar Apr 25 '23 14:04 mfeurer