torchmetrics icon indicating copy to clipboard operation
torchmetrics copied to clipboard

R2Score does not match sklearn.metrics.r2_score

Open adamklie opened this issue 3 years ago • 2 comments

🐛 Bug

The torchmetrics.R2Score outputs are not consistently matching what I get from sklearn.metrics.r2_score. When I test on the tensors and arrays from both of the docs pages they are equal, but when I use my own data I get a large negative number from torchmetrics and a reasonable one from sklearn.

Code sample

import torch
from torchmetrics import R2Score
import sklearn.metrics

target = torch.tensor([3, -0.5, 2, 7])
preds = torch.tensor([2.5, 0.0, 2, 8])
tm_r2score = R2Score()
print(tm_r2score(preds, target))

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print(sklearn.metrics.r2_score(y_true, y_pred))

uno = torch.tensor(sdata_train["log(max_activity)_SCALED"].values) # 7946 values
dos = torch.tensor(sdata_train["log(max_activity)_SCALED_PREDICTIONS"].values) # 7946 values
print(tm_r2score(uno, dos))

uno_x, dos_x = uno.numpy(), dos.numpy()
print(sklearn.metrics.r2_score(uno_x, dos_x))
>>> tensor(0.9486)  # dummy data torchmetrics out
>>> 0.9486081370449679  # dummy data sklearn out
>>> tensor(-17.8863) # my data torchmetrics out
>>> 0.16492233358169228 # my data sklearn out

Expected behavior

I would expect these to be equal, and based on a scatterplot of the true vs predicted, I would assume sklearn is correct. image

When I log the R2Score in a PL model over training I also see these large negative values for train. image

Environment

TorchMetrics installed from pip v0.9.3 Python 3.7.12, PyTorch 1.11.0 Linux OS

Additional context

Am I misunderstanding what R2Score is doing?

adamklie avatar Jul 27 '22 19:07 adamklie

Hi! thanks for your contribution!, great first issue!

github-actions[bot] avatar Jul 27 '22 19:07 github-actions[bot]

Hi @adamklie, Could you please provide us with some testing data that creates this discrepancy between our implementation and sklearns?

SkafteNicki avatar Jul 28 '22 07:07 SkafteNicki

After looking at the code again, it seems clear to be that the difference is simply due to the input order. Sklearn expects input as metric(target, preds) whereas we in torchmetrics expect it to be metric(preds, target). In the example uno looks like the true values and dos are the predicted values an the correct comparison would therefore be:

uno = torch.tensor(sdata_train["log(max_activity)_SCALED"].values)
dos = torch.tensor(sdata_train["log(max_activity)_SCALED_PREDICTIONS"].values)
print(tm_r2score(dos, uno))  # this line is changed
uno_x, dos_x = uno.numpy(), dos.numpy()
print(sklearn.metrics.r2_score(uno_x, dos_x))

Closing issue.

SkafteNicki avatar Aug 30 '22 20:08 SkafteNicki