[bug] dgl.dataloading.DataLoader forwards args to torch.utils.data.DataLoader but many are not valid
🐛 Bug
Currently, the DGL's DataLoader is a child of PyTorch's DataLoader, and passes kwargs to it, however DGL's dataloader does several things under the hood which breaks many of these options, and no indication is given to the user what is valid and what is not. Further more, CI covers a fraction of possible options users my pass in.
Current problematic options:
samplerandbatch_sampler: Specifying either of these results in errors aboutIterableDataset. Our documentation explicitly referencessamplerinside the description for theuse_ddpargument.collate_fn: This results in a duplicate keyword error.pin_memory: This results in a duplicate keyword error, as the super is called withpin_memory=self.pin_prefetcher.pin_memory_device(new in 1.12): This will only have effect on the input to the _PrefetchingIterator, which may not affect the data as returned to the user.generator: Has no effect.timeout: While this will work relatively correctly with respect to the workers, when a prefetching thread is active, it is possible to block on the dataloader for longer than the specified timeout, which would be unexpected from the user's perspective.worker_init_fn: This results in a duplicate keyword error, however this appears just to be a bug as we attempt to correctly wrap the passed in function, but fail to delete it from thekwargs.batch_size=None: While it would be odd for a user to do this given we don't provide users a way to manually batch data, our documentation does address it when it is valid for PyTorch's DataLoader.
We also have the issue of when PyTorch's DataLoader is updated, how to maintain compatibility. For example, the option pin_memory_device was just added in 1.12.
Proposed fix
I think the best way to fix this, is to remove the kwargs parameter to our DataLoader, and instead capture every argument explicitly. This way we can limit, verify, and test the different argument combinations passed in. It will also make reading our documentation easier.
Alternative fix
We could try to keep documentation on what parameters are valid, but we may need separate documentation per PyTorch version.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
Any updates?
I also have the problem of passing worker_init_fn.
I manage to remove the **kwargs and the duplicate keyword error is gone. However, the worker_init_fn does not seem to work.
Any updates? I also have the problem of passing
worker_init_fn.I manage to remove the **kwargs and the duplicate keyword error is gone. However, the
worker_init_fndoes not seem to work.
This is being addressed in #5420.
Thanks for your reply. The PR above indeed solves the duplicate keyword error.