scikit-learn
scikit-learn copied to clipboard
Build in an option to change the "positive" class in sklearn.metrics.precision_recall_curve and sklearn.metrics.roc_curve
Describe the workflow you want to enable
In some cases, it is nice to compare a machine learning classifier with experimental data using ROC or Precision-Recall Curves. For (e.g.) a logarithmic regression model, the score returned by the model will be a probability value or a value from the corresponding decision function. Both values are positively correlated with the "positive" class and can be readily used for calculation of fpr and tpr using sklearn.metrics.precision_recall_curve or sklearn.metrics.roc_curve. However, experimental data can also be negatively correlated with the "positive class". Thus, when calculating confusion matrices, one would need to assume that the "positive" events are not at the right side of the threshold, but on the left side. The same problem occurs for imbalanced test data sets. Here, it is useful to use precision-recall curves and to define the "positive" class as the "minority class" (see https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/). However, in some cases, the minority class is negatively correlated with the probability values or decision function values of the classifier (i.e., the minority class corresponds to a probability of zero or to negative decision funtion values). Than it is not enough to just use the "pos_label" option as again "positive" values are on the left and not on the right side of the thresholds used for calculating the confusion matrices/precision-recall curve (see above).
Describe your proposed solution
One could add a parameter that is given to sklearn.metrics.precision_recall_curve and sklearn.metrics.roc_curve. This parameter descibes whether the y_score
paramater is positively or negatively correlated with the "positive" class (something like pos_corr=String, default='psoitive'
; the options would be 'positive' or 'negative'). The functions would than get an additional if condition:
y_score=np.array(y_score)
if pos_corr == 'negative':
y_score=y_score*(-1)
This would transform a negative correlation to a positive one and sklearn.metrics.precision_recall_curve and sklearn.metrics.roc_curve could be used without further changes.
Describe alternatives you've considered, if relevant
No response
Additional context
No response