data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 📚 The doc issue Can we add in the example/migration file related to a `torch.node` datawriter (if already possible with the current API). See: https://github.com/pytorch/pytorch/issues/140296#issuecomment-2563190801 ### Suggest a potential...
### 🚀 The feature We should add a node that will send batches to device (probably one at a time). We could either separate this, add it on to pre-fetcher...
### 🚀 The feature Stop requiring BaseNode implementations to call super().__init__() ### Motivation, pitch Currently we require users to call super().__init__() but this can't be done easily with dataclasses, except...
### 🐛 Describe the bug I tried replacing DataLoader to StatefulDataLoader in: https://github.com/facebookresearch/seamless_communication/blob/90e2b57ac4d82fa2bfaa25caeffe39ceb8b2ebec/src/seamless_communication/cli/m4t/finetune/dataloader.py#L127 and got the following error. Any help would be appreciated. ``` Traceback (most recent call last): File...
### 🐛 Describe the bug Currently ParallelMap does not guarantee reproducibility if in_order = False, we should add a .. warning to the docstring and the flag, and also think...
### 🚀 The feature For drop-in replacement of DataLoader, we need to handle the common case of: apply batch of indices to get_item and then collate them. ### Motivation, pitch...
### 🚀 The feature Provide users utility functions to have drop-in replacement for the well-known torch.utils.data.DataLoader, as well as StatefulDataLoader. ### Motivation, pitch We should be able to re-construct exactly...
### 🚀 The feature Support for torch.utils.data.IterableDataset ### Motivation, pitch Currently IterableDataset (and possibly some map datasets) rely on torch.utils.data.get_worker_info. To ensure drop-in compatibility, we should make sure this works...
### 🚀 The feature Allow MapStyleWrapper to take parallel map kwargs ### Motivation, pitch MapStyleWrapper is a pretty handy utility but ham-strung because it doesn't not allow parallelism. We could...