Possible bug in binary classification `calibration_error`
🐛 Bug
In calibration_error(), the accuracies in the binary classification setting are not correctly computed I think. It just returns the targets. I am guessing this should rather return target == preds.round().int() or something similar? Am I missing something?
Code example
import torch
from torchmetrics.functional.classification import calibration_error
preds = torch.tensor([0.01, 0.001, 0.005]) # The raw sigmoid output
targets = torch.tensor([1, 1, 1])
calibration_error(confidences, targets)
# This returns: tensor(0.9947)
The model confidently predicts the wrong class, but is rewarded with a near perfect calibration score.
Environment
-
TorchMetrics version (and how you installed TM, e.g.
conda,pip, build from source):- Version 0.9.1
- Installed with
mamba
-
Python & PyTorch Version (e.g., 1.0):
- Python: 3.9.13
- PyTorch: 1.11.0.post202
-
Any other relevant information such as OS (e.g., Linux):
- I am on Ubuntu, Linux.
Added a little example to better illustrate my point.
By the way, using just a 0-vector would have been a simpler example, but it turns out the preds can't be 0 exactly due to how the binning is done. It could make sense to clamp the predictions in the binning process to prevent this. E.g.:
torch.clip(confidences, 1e-6, 1.0)
Hi, I checked this issue as an bigger refactor (see this issue https://github.com/Lightning-AI/metrics/issues/1001 and this PR https://github.com/Lightning-AI/metrics/pull/1195) and it seems that our calibration error is computing the right value.
First, in the example provided the metric is giving a score of 0.9942. As the metric is an calibration error the optimum would be 0 and not 1 and it therefore seems correct that the metric is giving a high score as the example is clearly not well calibrated.
Secondly I ran the example through an third party package https://github.com/fabiankueppers/calibration-framework which gives the same result as our implementation (we are actually using it for testing now).
Therefore, there does not seem to be an error in the implementation. Closing issue.