torcheval
torcheval copied to clipboard
Docs return description of binary_confusion_matrix incorrect
Not sure if I'm being daft here but I think this line is incorrect:
https://github.com/pytorch/torcheval/blob/e0444c6f7cf89e20b5bd17b16dea4e3793810ba9/torcheval/metrics/functional/classification/confusion_matrix.py#L23
It says that the returned tensor contains the values:
( (true positive, false negative) , (false positive, true negative) )
But if you look at the examples, it shows (e.g.):
>>> input = torch.tensor([0, 1, 0.7, 0.6])
>>> target = torch.tensor([0, 1, 1, 0])
>>> binary_confusion_matrix(input, target)
tensor([[1, 1],
[0, 2]])
Which for those inputs, I count:
tn = 1 #indices = 0
tp = 2 #indices = 1,2
fp = 1 #indices = 3
fn = 0
So I believe that would mean the actual tensor being returned is either [[fp, tn], [fn, tp]]
or [[tn, fp], [fn, tp]]
. From my own experiments, I'm pretty sure it's the latter.
Been scratching my head all morning about why my results look wrong and I think this is why.
Hi @ChrisBog-MV Thanks for pointing this out!
@JKSenthil I went through and looked at this. The issue is that we are following sklearn's convention of using
TN | FP
FN | TP
which is not standard for binary confusion matrices.
I would suggest we flip the rows and columns here with a bit of slicing
return _confusion_matrix_compute(matrix, normalize)[[1,0]][:, [1,0]]
tests and docstring examples will need to be updated as well. But basically we will then return
TP | FN
FP | TN
which is standard.
Hi!
+1. Also, I made an example that illustrates the problem with expected unique counts for each category (i.e. FP, FN, etc.)
confusion_matrix_example.py
import torcheval.metrics
import numpy as np
binary_confusion_matrix_metric = torcheval.metrics.BinaryConfusionMatrix()
# Intentionally construct a dataset that should evaluate to these counts:
# TP: 4; TN: 3; FP: 2; FN: 1.
predictions_and_labels = np.array([
# False negatives:
(0, 1),
# False positives:
(1, 0),
(1, 0),
# True negatives:
(0, 0),
(0, 0),
(0, 0),
# True positives:
(1, 1),
(1, 1),
(1, 1),
(1, 1),
])
predictions = predictions_and_labels[: , 0]
labels = predictions_and_labels[: , 1]
binary_confusion_matrix_metric.update(torch.from_numpy(predictions), torch.from_numpy(labels))
binary_confusion_matrix = binary_confusion_matrix_metric.compute()
print(binary_confusion_matrix)
# The torcheval docs indicate the return value should be:
# ( (true positive, false negative) , (false positive, true negative) )
# So, we'd expect ( (4, 1), (2, 3) ).
# However, we don't observe what we expect!
# Instead I think the docs should say:
# ( (true negative, false positive) , (false negative, true positive) )
$ python confusion_matrix_example.py
tensor([[3., 2.],
[1., 4.]])
PS: a torch.flip(binary_confusion_matrix, dims=[0, 1])
would also do the trick and perhaps be a bit more readable.