torchmetrics Dice score cannot be calculated for each class separately

🐛 Bug

Dice score cannot be calculated without reduction, instead raising a runtime error.

To Reproduce

Minimal reproducing example:

>> import torch
>> from torchmetrics import Dice
>> Dice(average='none', num_classes=3)

raises the following error

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "[redacted]/site-packages/torchmetrics/classification/dice.py", line 168, in __init__
      raise ValueError(f"The `reduce` {average} is not valid.")
ValueError: The `reduce` none is not valid.

Same error is encountered with average=None.

Expected behavior

The documentation states

... average: Defines the reduction that is applied. Should be one of the following: - 'micro' [default]: Calculate the metric globally, across all samples and classes. - 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class). - 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn). - 'none' or None: Calculate the metric for each class separately, and return the metric for every class. - 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

while neither 'none' nor None works.

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): pip, version 0.11.0 and 0.11.3.
Python & PyTorch Version (e.g., 1.0): 3.10.6, 1.13.1.
Any other relevant information such as OS (e.g., Linux): Mac.

Additional context

Looking at the code, something seems fishy. Comparing the following two code snippets of classification/dice.py: https://github.com/Lightning-AI/metrics/blob/825d17f32ee0b9a2a8024c89d4a09863d7eb45c3/src/torchmetrics/classification/dice.py#L149-L151 and https://github.com/Lightning-AI/metrics/blob/21b23b6d472ec542c764a789af63bd054fbb3512/src/torchmetrics/classification/dice.py#L167-L168

It seems line 167 is wrong since average is not modified between the two snippes.

Mar 07 '23 15:03 asbjrnmunk

@asbjrnmunk would you be interested in working on this case and adding no reduction?

Aug 25 '23 11:08 Borda

Isn't Dice score equivalent to F1 score (link)? Mathematically it works out the same, not sure if there are some implementation nuances. If both are the same, maybe just add make an alias for people who prefer the name 'Dice' and a small line in docs of their eqivalence.

Sep 25 '23 09:09 lirfu

Have the same issue, I used MulticlassF1Score with average None and it worked for me. (since dice is equivalent to F1 score : https://torchmetrics.readthedocs.io/en/stable/classification/f1_score.html#multiclassf1score:~:text=(values)-,MulticlassF1Score,-CLASS) pred = torch.tensor([0,2,5,2,2,2,1]) target= torch.tensor([0,1,5,2,1,2,1]) dice = MulticlassF1Score(num_classes=6,average=None) dice_score = dice(pred,target)

OUTPUT : tensor([1.0000, 0.5000, 0.6667, 0.0000, 0.0000, 1.0000])

Nov 27 '23 14:11 noureddinekhiati