Take cost metrics into account
when running scikit-learn benchmarks.
Hey @mfeurer, can you give me some details about this issue? So that I can work on this issue.
Hey, @v-parmar I just had a look in what I could suggest you, and found that the following would be a good approach to get started:
- Tackle issue #1245 to allow going through all tasks on OpenML without having to download them.
- Find a task with an associated cost matrix, for example via
In [8]: for task_id in openml.tasks.list_tasks(task_type=openml.tasks.TaskType.SUPERVISED_CLASSIFICATION): ...: try: ...: task = openml.tasks.get_task(task_id, download_data=False, download_qualities=False) ...: except: ...: continue ...: if task.cost_matrix is not None: ...: print(task_id, task.cost_matrix) - Integrate the cost matrix in the calculation of the metric, where the are often called
sample_weightsas in this metric.
I don't think there is standard right now for how the cost matrix should be formatted. Some bad examples include “yes”, “adam”, or “1”. Perhaps we should wait with tackling this until we have a specified format?
Could you please put this on the roadmap then? This seems like a basic thing that OpenML should handle.