data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

Updating CI version

CLA Signed

### 📚 The doc issue Can we add in the example/migration file related to a `torch.node` datawriter (if already possible with the current API). See: https://github.com/pytorch/pytorch/issues/140296#issuecomment-2563190801 ### Suggest a potential...

### 🚀 The feature We should add a node that will send batches to device (probably one at a time). We could either separate this, add it on to pre-fetcher...

nodes

### 🚀 The feature Stop requiring BaseNode implementations to call super().__init__() ### Motivation, pitch Currently we require users to call super().__init__() but this can't be done easily with dataclasses, except...

nodes

### 🐛 Describe the bug I tried replacing DataLoader to StatefulDataLoader in: https://github.com/facebookresearch/seamless_communication/blob/90e2b57ac4d82fa2bfaa25caeffe39ceb8b2ebec/src/seamless_communication/cli/m4t/finetune/dataloader.py#L127 and got the following error. Any help would be appreciated. ``` Traceback (most recent call last): File...

stateful_dataloader

### 🐛 Describe the bug Currently ParallelMap does not guarantee reproducibility if in_order = False, we should add a .. warning to the docstring and the flag, and also think...

nodes

### 🚀 The feature For drop-in replacement of DataLoader, we need to handle the common case of: apply batch of indices to get_item and then collate them. ### Motivation, pitch...

nodes

### 🚀 The feature Provide users utility functions to have drop-in replacement for the well-known torch.utils.data.DataLoader, as well as StatefulDataLoader. ### Motivation, pitch We should be able to re-construct exactly...

nodes

### 🚀 The feature Support for torch.utils.data.IterableDataset ### Motivation, pitch Currently IterableDataset (and possibly some map datasets) rely on torch.utils.data.get_worker_info. To ensure drop-in compatibility, we should make sure this works...

nodes

### 🚀 The feature Allow MapStyleWrapper to take parallel map kwargs ### Motivation, pitch MapStyleWrapper is a pretty handy utility but ham-strung because it doesn't not allow parallelism. We could...

nodes