pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

Support all iterator modes for fit/validate/test/predict

Open carmocca opened this issue 2 years ago • 12 comments

Description & Motivation

trainer.fit only works with CombinedLoader(..., mode="max_size_cycle"|"min_size")

trainer.{validate,test,predict} only works with CombinedLoader(..., mode="sequential")

This constraint is checked in the top-level loops: https://github.com/Lightning-AI/lightning/blob/0009cde1db1a9ab4e2f1e0a9f69a4affb59d5134/src/lightning/pytorch/loops/fit_loop.py#L351-L354 https://github.com/Lightning-AI/lightning/blob/0009cde1db1a9ab4e2f1e0a9f69a4affb59d5134/src/lightning/pytorch/loops/evaluation_loop.py#L182-L183

Pitch

Have all trainer functions support all modes

TODO:

  • [ ] FitLoop
  • [x] EvaluationLoop (#17163)
  • [ ] PredictionLoop

Alternatives

Not do it

Additional context

This builds on top of https://github.com/Lightning-AI/lightning/pull/16726

cc @borda @justusschock @awaelchli

carmocca avatar Feb 21 '23 16:02 carmocca

I am migrating my code to PL 2 and it seems that for the val dataloader getting a batch to be of the form {"key_a": batch_dataloader_a, "key_b": batch_dataloader_b} is not implemented in PL 2 yet. Here my old code as a reference:

def val_dataloader(self):
    val_dataloaders = {
        key: DataLoader(
            dataset,
            batch_size=dataset.batch_size,
            shuffle=False,
            num_workers=dataset.num_workers,
            pin_memory=False,
        )
        for key, dataset in self.val_datasets.items()
    }
    combined_val_loaders = CombinedLoader(val_dataloaders, "max_size_cycle")
    return combined_val_loaders

mees avatar Mar 20 '23 19:03 mees

@mees I added support for that in #17163, if you want to give it a try. The PR only implements it for validation and testing.

carmocca avatar Mar 28 '23 23:03 carmocca

@mees I added support for that in #17163, if you want to give it a try. The PR only implements it for validation and testing.

really helpful! I hope this gets into "stable" soon.... or even the next release!

bkmi avatar May 15 '23 15:05 bkmi

I really wish there was sequential support in the training loop. Right now, it's not clear how one should handle batches of potentially different sizes in the training_step. We'd have to collate inside the training_step and ensure the given batch size is divided by the number of data loaders to keep gradient accumulation consistent etc. It gets pretty ugly. @carmocca Thank you for your work on this issue. Not to rush you, but any update on the sequential support in the training loop? Thanks again!

FarzanT avatar May 15 '23 17:05 FarzanT

Unfortunately, I dont have bandwidth to work on this now. If somebody wants to try, I can help getting the PR merged. You can follow the structure in the EvaluationLoop. The training hooks will need an optional dataloader_idx argument

carmocca avatar May 17 '23 13:05 carmocca

@mees I added support for that in #17163, if you want to give it a try. The PR only implements it for validation and testing.

really helpful! I hope this gets into "stable" soon.... or even the next release!

Me too! Is there any release timeline / nightly version with this supported? I can't use lightning without this and really would like to leverage its other features!

surya-narayanan avatar May 25 '23 21:05 surya-narayanan

Ditto! FYI for others pulling nightly will get the feature: https://github.com/Lightning-AI/lightning/pull/17163

spfrommer avatar Aug 17 '23 23:08 spfrommer

Thanks! I also need this great feature.

chenhaomingbob avatar Oct 06 '23 07:10 chenhaomingbob

+1, please release this feature asap!

johnathanchiu avatar Oct 31 '23 03:10 johnathanchiu

Is this feature currently worked on?

lukas-folle-snkeos avatar Jul 30 '24 07:07 lukas-folle-snkeos

As far as I know, nobody is currently working on it, Lukas

carmocca avatar Jul 30 '24 13:07 carmocca