torchmetrics Recall and Specificity values are the same

🐛 Bug

I'm using Recall and Specificity module metrics when training a binary classification model. The graphs in MLFlow are the exact same. Please see code below for how I'm using the module metrics. I'm pretty sure I'm doing it correctly because I'm following the torchmetrics doc as close as I can.

Code sample

def ModelMetrics:
    train_metrics=[
        AUROC(
            num_classes=out_features,
            average='weighted' if out_features > 2 else 'macro',
        ),
    ],
    valid_metrics=[
      AUROC(
          num_classes=out_features,
          average='weighted' if out_features > 2 else 'macro',
      ),
      Specificity(
          num_classes=out_features,
          average='weighted' if out_features > 2 else 'macro',
      ),
      Recall(
          num_classes=out_features,
          average='weighted' if out_features > 2 else 'macro',
      )
    ],
    self.train_metrics: MetricCollection = MetricCollection(train_metrics, prefix='train_')
    self.valid_metrics: MetricCollection = MetricCollection(valid_metrics, prefix='valid_')

def <phase>_step(...):
    if phase == 'train':
        head.train_metrics.update(preds, targets)
    elif phase == 'valid':
        head.valid_metrics.update(preds, targets)

def <phase>_epoch_end(...):
      for head in self.model.model_heads:
            # Use Metrics API
            if phase == 'train':
                for metric_name, metric in head.train_metrics.items():
                    self.log(f'{head.name}_{metric_name}', metric)
            elif phase == 'valid':
                for metric_name, metric in head.valid_metrics.items():
                    self.log(f'{head.name}_{metric_name}', metric)
     return

Expected behavior

Expecting Recall and Specificity to be different values.

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): pip, 0.9.2
Python & PyTorch Version (e.g., 1.0): python 3.8.12, torch 1.12.0+cu116, pytorch-lightning 1.6.4
Any other relevant information such as OS (e.g., Linux): AWS EC2 p3.8xlarge on Deep Learning AMI

Additional context

If you guys want any other info please let me know. See below for MLFlow graphs.

Screen Shot 2022-07-07 at 11 08 42 AM

Jul 07 '22 18:07 angadkalra

Hi! thanks for your contribution!, great first issue!

Jul 07 '22 18:07 github-actions[bot]

can you actually give a few sample data (just raw tensor numbers that you think should produce different outputs?

I am asking because I again checked both the formula and the implementation and they seem to be correct.

Ideally you can just generate random number with a fixed seed. But as I said, for me this is not reproducible

Jul 13 '22 09:07 justusschock

Issue will be fixed by classification refactor: see this issue https://github.com/Lightning-AI/metrics/issues/1001 and this PR https://github.com/Lightning-AI/metrics/pull/1195 for all changes

Small recap: This issue describe that metric Recall and Specificity are all the same in the binary setting, which is wrong. The problem with the current implementation is that the metrics are calculated as average over the 0 and 1 class, which makes all the scores collapse into the same metric essentially.

Using the new binary_* versions of all the metrics:

from torchmetrics.functional import binary_recall, binary_specificity
import torch
preds = torch.rand(10)
target = torch.randint(0, 2, (10,))
binary_recall(preds, target)  # tensor(0.5000)
binary_specificity(preds, target)  # tensor(0.6250)

which also corresponds to what sklearn is giving. Sorry for the confusion that this have given rise to. Issue will be closed when https://github.com/Lightning-AI/metrics/pull/1195 is merged.

Aug 30 '22 12:08 SkafteNicki

torchmetrics torchmetrics copied to clipboard

Recall and Specificity values are the same

🐛 Bug

Code sample

Expected behavior

Environment

Additional context

torchmetrics
torchmetrics copied to clipboard