glosario
glosario copied to clipboard
[16/12/21]: Add definitions of classification metrics
Suggestion to add definitions of common metrics when using classification models. The terms should be basic data science knowledge but are not currently listed in the Glossary.
-
Confusion matrix A $N \times N$ matrix that describes the performance of a classification model, where $N$ is the number of classes. Each row in the matrix represents the instances of actual classes and each column represents the predicted classes. For a binary classification model the confusion matrix gives the True Positives (TP), False Negatives (FN), False Positives (FP) and True Negatives (TN) in the 1st, 2nd, 3rd and 4th quadrants, respectively. The table can be used to calculate Accuracy, Sensitivity and Specificity amongst other measures of the model.
-
Accuracy Statistical measure of a classification model which gives the proportion of correct predictions among total number of cases. It is calculated as Accuracy = (TP+TN)/(TP+TN+FP+FN)
-
Sensitivity Statistical measure of a classification model which gives the True Positive rate. For example, the proportion of people who have a disease that test positive. Calculated as Sensitivity = TP/(TP+FN).
-
Specificity Statistical measure of a classification model which gives the False Negative rate. For example, the proportion of people who do not have a disease that test negative. Calculated as Specificity = TN/(TN+FP).