FEA Confusion matrix derived metrics
Reference Issues/PRs
Continues #17265 Fixes #5516
What does this implement/fix? Explain your changes.
Add metrics.specificity_score and metrics.npv_score, based on metrics.fpr_tpr_fnr_tnr_scores from #17265
Any other comments?
@jnothman - Hi, will you be in position to review this PR?
Sorry, I'd forgotten to submit this partial review a few weeks ago
No worries. Please have a look at the current version.
@jnothman I am wondering if are not duplicating code here. I am thinking that we could have a single private function that could compute all the necessary statistics and then the precision-recall and other metrics would only call this function. I could even think that we could pass a parameter that is a dictionary that could explicitly request some of the metrics to be returned as a bunch.
Here, I am also questioning adding the tpr/fpr/fnr/tnr function since it will have duplicated statistics with the recall score and potentially later the specificity. It seems that @amueller wanted to avoid this in the original issue: https://github.com/scikit-learn/scikit-learn/issues/5516#issuecomment-306464383
multilabel_confusion_matrix removed a lot of the duplication. Problem is that users want these metrics and for whatever reason find them hard to compute themselves. And given how much revision we've made to P/R/F, there are certainly tricky cases! The present PR could have less duplication; but the main place there is duplication here is in the tests which will require more work to refactor.
I had thought that to avoid clutter, we could avoid putting these measures into metrics, but make them available when a user sought the set of scorers relevant to a particular task vis a vis #12385
multilabel_confusion_matrix removed a lot of the duplication. Problem is that users want these metrics and for whatever reason find them hard to compute themselves. And given how much revision we've made to P/R/F, there are certainly tricky cases! The present PR could have less duplication; but the main place there is duplication here is in the tests which will require more work to refactor.
OK so let's go ahead with this proposal. I will make the first round of review and look afterwards if we can refactor the test.
@Pawel-Kranzberg Could you solve the conflicts in the files. I assume that the conflicts here are just related to execute black on the file.
The current metrics.classification_report return
- Precision (Positive Predictive Value)
- Recall (True Positive Rate)
- F1
- Support
It is great, but far from complete. The following 2 are very important as well
- Negative Predictive Value (NPV)
- Specificity (True Negative Rate, or TNR)
The metrics.specificity_score is a great addition because both TNR and NPV need it. Really hope both TNR and NPV can be included in the classification report sometime in the future. The metrics.specificity_score is really the key here.
Once we can cover these 4 metrics for the multiclass classification
- Precision (Positive Predictive Value) --> requires (TP, FP)
- Recall (True Positive Rate) --> requires (TP, FN)
- Negative Predictive Value (NPV) --> requires (TN, FN)
- Specificity (True Negative Rate) --> requires (TN, FP)
we can pretty much derive the rest of the metrics, such as
- FNR (False Negative Rate = 1 - TPR)
- FPR (False Positive Rate = 1 - TNR)
- FDR (False Discovery Rate = 1 - PPV)
- FOR (False Omission Rate = 1 - NPV)
- ACC (Accuracy)
- MCC (Mathews Correlation Coefficient)
- Prevelance Threshold
- ...etc
Excellent work! 👍 👍 👍
@Scoodood I would like to focus this PR only on the metrics addition without touching on the classification report. It could be done in a subsequent PR. Could you open an issue from now on mentioning your feature request regarding the report? I assume that at some point, it could be interesting to have a parameter to filter the desired metrics.
@Pawel-Kranzberg We could consider adding the NPV as well to close the loop.
@Scoodood I would like to focus this PR only on the metrics addition without touching on the classification report. It could be done in a subsequent PR. Could you open an issue from now on mentioning your feature request regarding the report? I assume that at some point, it could be interesting to have a parameter to filter the desired metrics.
It's done. https://github.com/scikit-learn/scikit-learn/issues/21000
@Pawel-Kranzberg Could you solve the conflicts in the files. I assume that the conflicts here are just related to execute
blackon the file.
Done.
@Pawel-Kranzberg We could consider adding the NPV as well to close the loop.
Done.
@glemaitre - Thank you for the comments. The PR is ready for another round of review.
@Scoodood - FYI, I've added the NPV score fuction (npv_score).
@Scoodood - FYI, I've added the NPV score fuction (
npv_score).
hi @Pawel-Kranzberg, just wondering when these new metrics will be integrated into the official scikit-learn for public consumption?
@Scoodood - FYI, I've added the NPV score fuction (
npv_score).hi @Pawel-Kranzberg, just wondering when these new metrics will be integrated into the official scikit-learn for public consumption?
I keep my fingers crossed...
Nice the CI is passing. I will make a new pass on it.
We won't have time to finish the review on this one before the 1.2 release. Moving it to 1.3
✔️ Linting Passed
All linting checks passed. Your pull request is in excellent shape! ☀️
@lorentzenchr would you be able to maybe have a look?
From the long discussion in #5516, better ping @jnothman @amueller @glemaitre .
@Pawel-Kranzberg Could you please stop merging the main branch as long as there are no merge conflict? The reason is that it might hide discussions in Github and also generates notifications. And it should be in your interest that maintainers don't unsubscribe from this PR.