imbalanced-learn icon indicating copy to clipboard operation
imbalanced-learn copied to clipboard

[ENH] set classification_report_imbalanced output_dict keys to target_names

Open Abdelgha-4 opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe

currently in classification_report_imbalanced, when setting output_dict to True it ignores the given target_names.

Example:

from pprint import pprint
import numpy as np
from imblearn.metrics import classification_report_imbalanced
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1] 
target_names = ['class 0', 'class 1', 'class 2']
pprint(classification_report_imbalanced(y_true, y_pred, target_names=target_names, output_dict=1))

Current output:

{0: {'f1': 0.6666666666666666,
     'geo': 0.8660254037844386,
     'iba': 0.7687499999999998,
     'pre': 0.5,
     'rec': 1.0,
     'spe': 0.75,
     'sup': 1},
 1: {'f1': 0.0,
     'geo': 0.0,
     'iba': 0.0,
     'pre': 0.0,
     'rec': 0.0,
     'spe': 0.75,
     'sup': 1},
 2: {'f1': 0.8,
     'geo': 0.816496580927726,
     'iba': 0.6444444444444444,
     'pre': 1.0,
     'rec': 0.6666666666666666,
     'spe': 1.0,
     'sup': 3},
 'avg_f1': 0.6133333333333334,
 'avg_geo': 0.6631030293135233,
 'avg_iba': 0.5404166666666665,
 'avg_pre': 0.7,
 'avg_rec': 0.6,
 'avg_spe': 0.9,
 'total_support': 5}

Describe the solution you'd like

When output_dict=True, set the keys of the returned dict to the corresponding target_names.

Expected output:

{'class 0': {'f1': 0.6666666666666666,
     'geo': 0.8660254037844386,
     'iba': 0.7687499999999998,
     'pre': 0.5,
     'rec': 1.0,
     'spe': 0.75,
     'sup': 1},
 'class 1': {'f1': 0.0,
     'geo': 0.0,
     'iba': 0.0,
     'pre': 0.0,
     'rec': 0.0,
     'spe': 0.75,
     'sup': 1},
 'class 2': {'f1': 0.8,
     'geo': 0.816496580927726,
     'iba': 0.6444444444444444,
     'pre': 1.0,
     'rec': 0.6666666666666666,
     'spe': 1.0,
     'sup': 3},
 'avg_f1': 0.6133333333333334,
 'avg_geo': 0.6631030293135233,
 'avg_iba': 0.5404166666666665,
 'avg_pre': 0.7,
 'avg_rec': 0.6,
 'avg_spe': 0.9,
 'total_support': 5}

Additional context

I think this is important because it will offer a better representation of the output and being helpful for later use, and also I think ignoring target_names silently doesn't make much sense.

Actually, I'll found it even more useful if the last element gets replaced by:

 'avg_tot': {'f1': 0.6133333333333334,
     'geo': 0.6631030293135233,
     'iba': 0.5404166666666665,
     'pre': 0.7,
     'rec': 0.6,
     'spe': 0.9,
     'sup': 5},

Instead of:

 'avg_f1': 0.6133333333333334,
 'avg_geo': 0.6631030293135233,
 'avg_iba': 0.5404166666666665,
 'avg_pre': 0.7,
 'avg_rec': 0.6,
 'avg_spe': 0.9,
 'total_support': 5}

Abdelgha-4 avatar Sep 19 '21 12:09 Abdelgha-4

Actually I think that this is a bug because it says in the docs:

Dictionary returned if output_dict is True. Dictionary has the following structure:

{'label 1': {'pre':0.5,
             'rec':1.0,
             ...
            },
 'label 2': { ... },
  ...
}

Versions for reference System: python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] executable: /opt/conda/bin/python machine: Linux-5.4.120+-x86_64-with-debian-buster-sid

Python dependencies: pip: 21.1.2 setuptools: 49.6.0.post20210108 sklearn: 0.24.2 imblearn: 0.8.0 numpy: 1.19.5 scipy: 1.6.3 Cython: 0.29.23 pandas: 1.2.4 matplotlib: 3.4.2 joblib: 1.0.1 threadpoolctl: 2.1.0

Abdelgha-4 avatar Sep 19 '21 12:09 Abdelgha-4