torchmetrics icon indicating copy to clipboard operation
torchmetrics copied to clipboard

Accuracy failed in dp (DataParallell) mode

Open richarddwang opened this issue 3 years ago • 1 comments

🐛 Bug

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

To Reproduce

The code below works well in ddp mode, but in dp mode it splits error.

Code sample

class MyLightningModule(pl.LightningModule):

    def __init__(...):
        ....
        self.my_accuracy = Accuracy(num_classes=6)

    def training_step(...):
        ...
        self.log(f"the_accuracy", self.my_accuracy(logits, labels))

Expected behavior

Environment

  • TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): pip, 0.8.2
  • Python & PyTorch Version (e.g., 1.0): 1.11.0
  • Any other relevant information such as OS (e.g., Linux): linux (Ubuntu)

richarddwang avatar May 26 '22 01:05 richarddwang

Hi @richarddwang, To make the debugging easier, could you provide a fully reproducible script ?

SkafteNicki avatar May 27 '22 10:05 SkafteNicki