Dice score cannot be calculated for each class separately
🐛 Bug
Dice score cannot be calculated without reduction, instead raising a runtime error.
To Reproduce
Minimal reproducing example:
>> import torch
>> from torchmetrics import Dice
>> Dice(average='none', num_classes=3)
raises the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[redacted]/site-packages/torchmetrics/classification/dice.py", line 168, in __init__
raise ValueError(f"The `reduce` {average} is not valid.")
ValueError: The `reduce` none is not valid.
Same error is encountered with average=None.
Expected behavior
The documentation states
... average: Defines the reduction that is applied. Should be one of the following: -
'micro'[default]: Calculate the metric globally, across all samples and classes. -'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class). -'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn). -'none'orNone: Calculate the metric for each class separately, and return the metric for every class. -'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).
while neither 'none' nor None works.
Environment
- TorchMetrics version (and how you installed TM, e.g.
conda,pip, build from source): pip, version 0.11.0 and 0.11.3. - Python & PyTorch Version (e.g., 1.0): 3.10.6, 1.13.1.
- Any other relevant information such as OS (e.g., Linux): Mac.
Additional context
Looking at the code, something seems fishy. Comparing the following two code snippets of classification/dice.py:
https://github.com/Lightning-AI/metrics/blob/825d17f32ee0b9a2a8024c89d4a09863d7eb45c3/src/torchmetrics/classification/dice.py#L149-L151
and
https://github.com/Lightning-AI/metrics/blob/21b23b6d472ec542c764a789af63bd054fbb3512/src/torchmetrics/classification/dice.py#L167-L168
It seems line 167 is wrong since average is not modified between the two snippes.
@asbjrnmunk would you be interested in working on this case and adding no reduction?
Isn't Dice score equivalent to F1 score (link)? Mathematically it works out the same, not sure if there are some implementation nuances. If both are the same, maybe just add make an alias for people who prefer the name 'Dice' and a small line in docs of their eqivalence.
Have the same issue, I used MulticlassF1Score with average None and it worked for me. (since dice is equivalent to F1 score : https://torchmetrics.readthedocs.io/en/stable/classification/f1_score.html#multiclassf1score:~:text=(values)-,MulticlassF1Score,-CLASS)
pred = torch.tensor([0,2,5,2,2,2,1])
target= torch.tensor([0,1,5,2,1,2,1])
dice = MulticlassF1Score(num_classes=6,average=None)
dice_score = dice(pred,target)
OUTPUT :
tensor([1.0000, 0.5000, 0.6667, 0.0000, 0.0000, 1.0000])