pytorch-lightning `self.log` raised error when number of dataloader is not consistent

`self.log` raised error when number of dataloader is not consistent

Open ding3820 opened this issue 2 years ago • 1 comments

Bug description

Hi all,

I posted a discussion in Lightning.ai forum here and @awaelchli suggested me reporting an issue.

There might be a bug in the way self.log recording dataloader_idx. If we have two validation dataloaders, says A and B. We use A every epoch, but only use B every 2 epoch. However, while using self.log, in validation_step(), an error would show up:

You called self.log({name}, ...) twice in {fx} with different arguments. This is not allowed

(see here)

The way I implement it was by switching the available dataloaders in val_dataloader() and reload dataloader every epoch.

def val_dataloader():
    if self.should_run_B():
        return [loader_A, loader_B]
    else:
        return [loader_A]

I notice that when we are at the epoch that only use one validation dataloader, dataloader_idx is always None. On the other hand, when we have two validation dataloaders, dataloader_idx would be sequentially presented as 0 or 1 in validation_step. If I understand correctly, this is the main reason causing the error.

Another interesting finding is that if we set add_dataloader_idx=True for all self.log, the program would run without error. But the tensorboard logging would be wrong that shows and c_0 andc_0/dataloader_idx_0 together (See snapshot below). These two were meant to remain in the same figure but somehow it got split into two figures. Probably it is because of the alternating dataloader design.

How to reproduce the bug

import torch
from torch.utils.data import DataLoader
from pytorch_lightning.demos.boring_classes import BoringModel, RandomDataset
from pytorch_lightning import Trainer

class TestModel(BoringModel):
    def training_step(self, batch, batch_idx):
        out = super().training_step(batch, batch_idx)
        self.log("a", out["loss"])
        self.log("b", out["loss"], on_step=True, on_epoch=True)
        return out

    def validation_step(self, batch, batch_idx, dataloader_idx=0):
        out = super().validation_step(batch, batch_idx)
        if dataloader_idx == 0:
            self.log("c_0", out["x"], add_dataloader_idx=False)
            self.log("d_0", out["x"], on_step=True, on_epoch=True, add_dataloader_idx=False)
        elif dataloader_idx == 1:
            self.log("c_1", out["x"], add_dataloader_idx=False)
            self.log("d_1", out["x"], on_step=True, on_epoch=True, add_dataloader_idx=False)
        return out

    def validation_epoch_end(self, outputs):
        self.log("g", torch.tensor(2, device=self.device), on_epoch=True)

    def val_dataloader(self):
        if self.current_epoch % 2:
            return [DataLoader(RandomDataset(32, 64)), DataLoader(RandomDataset(32, 64))]
        else:
            return [DataLoader(RandomDataset(32, 64))]

model = TestModel()

trainer = Trainer(
    reload_dataloaders_every_n_epochs=1,
    default_root_dir="test",
    max_epochs=10,
    log_every_n_steps=1,
    enable_model_summary=False,
)
trainer.fit(model)

Error messages and logs

You called `self.log(c_0, ...)` twice in `validation_step` with different arguments. This is not allowed

Environment

CUDA:
- GPU:
  - NVIDIA TITAN RTX
- available: True
- version: 11.7
Lightning:
- pytorch-lightning: 1.7.2
- pytorch-quantization: 2.1.2
- torch: 1.12.0a0+8a1a93a
- torch-tensorrt: 1.1.0a0
- torchmetrics: 0.9.3
- torchtext: 0.13.0a0
- torchvision: 0.13.0a0
System:
- OS: Linux
- architecture:
  - 64bit
  - ELF
- processor: x86_64
- python: 3.8.13
- version: #63-Ubuntu SMP Thu Nov 24 13:43:17 UTC 2022

More info

No response

cc @carmocca @Blaizzy

Jan 19 '23 03:01 ding3820

Have you solved the problem? I met this question too.

Jan 29 '24 06:01 jin1041

pytorch-lightning pytorch-lightning copied to clipboard

`self.log` raised error when number of dataloader is not consistent

Bug description

How to reproduce the bug

Error messages and logs

Environment

More info

pytorch-lightning
pytorch-lightning copied to clipboard