imbalanced-learn
imbalanced-learn copied to clipboard
[ENH] set classification_report_imbalanced output_dict keys to target_names
Is your feature request related to a problem? Please describe
currently in classification_report_imbalanced
, when setting output_dict
to True
it ignores the given target_names
.
Example:
from pprint import pprint
import numpy as np
from imblearn.metrics import classification_report_imbalanced
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
pprint(classification_report_imbalanced(y_true, y_pred, target_names=target_names, output_dict=1))
Current output:
{0: {'f1': 0.6666666666666666,
'geo': 0.8660254037844386,
'iba': 0.7687499999999998,
'pre': 0.5,
'rec': 1.0,
'spe': 0.75,
'sup': 1},
1: {'f1': 0.0,
'geo': 0.0,
'iba': 0.0,
'pre': 0.0,
'rec': 0.0,
'spe': 0.75,
'sup': 1},
2: {'f1': 0.8,
'geo': 0.816496580927726,
'iba': 0.6444444444444444,
'pre': 1.0,
'rec': 0.6666666666666666,
'spe': 1.0,
'sup': 3},
'avg_f1': 0.6133333333333334,
'avg_geo': 0.6631030293135233,
'avg_iba': 0.5404166666666665,
'avg_pre': 0.7,
'avg_rec': 0.6,
'avg_spe': 0.9,
'total_support': 5}
Describe the solution you'd like
When output_dict=True
, set the keys of the returned dict to the corresponding target_names
.
Expected output:
{'class 0': {'f1': 0.6666666666666666,
'geo': 0.8660254037844386,
'iba': 0.7687499999999998,
'pre': 0.5,
'rec': 1.0,
'spe': 0.75,
'sup': 1},
'class 1': {'f1': 0.0,
'geo': 0.0,
'iba': 0.0,
'pre': 0.0,
'rec': 0.0,
'spe': 0.75,
'sup': 1},
'class 2': {'f1': 0.8,
'geo': 0.816496580927726,
'iba': 0.6444444444444444,
'pre': 1.0,
'rec': 0.6666666666666666,
'spe': 1.0,
'sup': 3},
'avg_f1': 0.6133333333333334,
'avg_geo': 0.6631030293135233,
'avg_iba': 0.5404166666666665,
'avg_pre': 0.7,
'avg_rec': 0.6,
'avg_spe': 0.9,
'total_support': 5}
Additional context
I think this is important because it will offer a better representation of the output and being helpful for later use, and also I think ignoring target_names
silently doesn't make much sense.
Actually, I'll found it even more useful if the last element gets replaced by:
'avg_tot': {'f1': 0.6133333333333334,
'geo': 0.6631030293135233,
'iba': 0.5404166666666665,
'pre': 0.7,
'rec': 0.6,
'spe': 0.9,
'sup': 5},
Instead of:
'avg_f1': 0.6133333333333334,
'avg_geo': 0.6631030293135233,
'avg_iba': 0.5404166666666665,
'avg_pre': 0.7,
'avg_rec': 0.6,
'avg_spe': 0.9,
'total_support': 5}
Actually I think that this is a bug because it says in the docs:
Dictionary returned if output_dict is True. Dictionary has the following structure:
{'label 1': {'pre':0.5, 'rec':1.0, ... }, 'label 2': { ... }, ... }
Versions for reference System: python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] executable: /opt/conda/bin/python machine: Linux-5.4.120+-x86_64-with-debian-buster-sid
Python dependencies: pip: 21.1.2 setuptools: 49.6.0.post20210108 sklearn: 0.24.2 imblearn: 0.8.0 numpy: 1.19.5 scipy: 1.6.3 Cython: 0.29.23 pandas: 1.2.4 matplotlib: 3.4.2 joblib: 1.0.1 threadpoolctl: 2.1.0