Some metrics are handling absent values incorrectly
🐛 Bug
Some metrics such as Accuracy, Precision, Recall and F1Score are handling the absent values incorrectly.
A value absent in target and pred, and therefore correctly predicted, is considered incorrect.
To Reproduce
Steps to reproduce the behavior...
- Initialize the metric with
num_classes = 2,mdmc_average = "samplewise"andaverage = "none"; - create a batch of 2 elements predicted correctly but in one of the two elements of the batch one of the classes must be absent.
Code sample
import torch
from torchmetrics import Accuracy
target = torch.tensor(
[
[0,0,0,0],
[0,0,1,1],
]
)
preds = torch.tensor(
[
[0,0,0,0],
[0,0,1,1],
]
)
acc = Accuracy(num_classes=2, average="none", mdmc_average="samplewise")
actual_result = acc(pred, target)
expected_result = torch.tensor([1., 1.])
assert torch.equal(expected_result, actual_result)
Expected behavior
The result should be torch.tensor([1., 1.]) because the two classes are predicted correctly for both elements of the batch.
In fact, in the first element of the batch the absence of class 1 is expected by the target tensor.
Despite this the result of the metric is torch.tensor([1., .5]), because in the first element of the batch the value of the metric for class 1 is 0.0
Environment
- OS (e.g., Linux): Linux
- Python & PyTorch Version (e.g., 1.0):
3.8.12&1.11 - How you installed PyTorch (
conda,pip, build command if you used source):pip - Any other relevant information:
Additional context
The metric JaccardIndex provide the argument absent_score to handle such cases.
Hi! thanks for your contribution!, great first issue!
Issue will be fixed by classification refactor: see this issue https://github.com/Lightning-AI/metrics/issues/1001 and this PR https://github.com/Lightning-AI/metrics/pull/1195 for all changes
Small recap: This issue describe that accuracy metric is not computing the right value in the binary setting. The problem with the current implementation is that the metric are calculated as average over the 0 and 1 class, which is wrong.
After the refactor this has been fixed. Using the new binary_* version of the metric on the provided example:
from torchmetrics.functional import binary_accuracy
import torch
target = torch.tensor(
[
[0,0,0,0],
[0,0,1,1],
]
)
preds = torch.tensor(
[
[0,0,0,0],
[0,0,1,1],
]
)
binary_accuracy(preds, target, multidim_average="samplewise") # tensor([1., 1.])
which give the correct result. Issue will be closed when https://github.com/Lightning-AI/metrics/pull/1195 is merged.