torchmetrics
torchmetrics copied to clipboard
Accuracy failed in dp (DataParallell) mode
🐛 Bug
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
To Reproduce
The code below works well in ddp mode, but in dp mode it splits error.
Code sample
class MyLightningModule(pl.LightningModule):
def __init__(...):
....
self.my_accuracy = Accuracy(num_classes=6)
def training_step(...):
...
self.log(f"the_accuracy", self.my_accuracy(logits, labels))
Expected behavior
Environment
- TorchMetrics version (and how you installed TM, e.g.
conda,pip, build from source): pip, 0.8.2 - Python & PyTorch Version (e.g., 1.0): 1.11.0
- Any other relevant information such as OS (e.g., Linux): linux (Ubuntu)
Hi @richarddwang, To make the debugging easier, could you provide a fully reproducible script ?