data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

We often have datapipes that return tuples `(img, target)` where we just want to call transformations on the img, but not the target. Sometimes it's the opposite: I want to...

### 🐛 Describe the bug After `AWSSDK` is integrated with TorchData, we now have two categories of `DataPipe`s to access and load data from AWS S3 Bucket: 1. [`DataPipe`](https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/load/fsspec.py) using...

Stack from [ghstack](https://github.com/ezyang/ghstack): * #728 * __->__ #746 By creating a deep copy of the input `ReadingService` during the initialization of `DataLoader2`, we allow the instance of `ReadingService` to store...

CLA Signed
topic: improvements

This PR adds support for generating archive-based datasets (tar archives, pickles, torch.save) with different binary data storage (BytesIO, tensor). It also enables training on such archive-based datasets on the torchvision...

CLA Signed

### 🚀 The feature List the reference of state transition proposal in [the PR](https://github.com/pytorch/pytorch/pull/83535): - Initial state: `NotStarted` - `iter` -> Call `reset` and change the state to `Iterating` -...

This PR runs Nicolas' benchmark suite nightly and stores the results in a github artifact so anyone on the team can inspect them. It's also possible to trigger the benchmark...

CLA Signed

### 📚 The doc issue There is a PR https://github.com/pytorch/pytorch/pull/82797 landed into PyTorch core, which adds the functionality to validate if the example in comment is runnable. However, in the...

Better Engineering

### 🐛 Describe the bug # Memory increase at the start of iteration after the start I have been trying to use DataLoader2 with multiprocessing (and distributed in some cases)....

### 🐛 Describe the bug ``` dp = ... dp = dp.sharding_filter() rs = MultiProcessingReadingService(num_workers=4) dataloader = DataLoader2(dp, reading_service=rs) for _ in dataloader: pass dataloader.shutdown() ``` ``` Exception: Can not...

As of July 2023, we have paused active development on TorchData and have paused new releases. We have learnt a lot from building it and hearing from users, but also...