fastLink
fastLink copied to clipboard
New feature request: Matthews correlation coefficient
Please add Matthews correlation coefficient (MCC) as an additional statistic for the confusion table:
TP * TN - FP * FN
MCC = -----------------------------------------------------
[(TP + FP) * (FN + TN) * (FP + TN) * (TP + FN)]^(1/2)
The MCC is useful as an overall measure of the linkage quality. The MCC is better than Accuracy and the F1-score for imbalanced data because it adjusts for the balance ratios of the four confusion table categories (TP, TN, FP, and FN). In practice, I find that most linkage data are imbalanced by having mostly TN.
Wikipedia: https://en.wikipedia.org/wiki/Matthews_correlation_coefficient Matthew's article (1975): https://doi.org/10.1016/0005-2795(75)90109-9
Matthew, page 445:
"A correlation of:
C = 1 indicates perfect agreement,
C = 0 is expected for a prediction no better than random, and
C = -1 indicates total disagreement between prediction and observation".
Mentioned in Tharwat's article (2018): https://doi.org/10.1016/j.aci.2018.08.003 Recommended by Luque et al (2019): https://doi.org/10.1016/j.patcog.2019.02.023
Anders
Recommended by Canbek et al. (2021): https://rdcu.be/cvT7d
Conclusion:
In conclusion, this study proposes a new comprehensive benchmarking method to analyze the robustness of performance metrics and ranks 15 performance metrics in the literature. Researchers can use MCC as the most robust metric for general objective purposes to be on the safe side.
Full reference: Canbek, G., Taskaya Temizel, T. & Sagiroglu, S. BenchMetrics: a systematic benchmarking method for binary classification performance metrics. Neural Computing and Applications 33, 14623–14650 (2021). https://doi.org/10.1007/s00521-021-06103-6