data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
We often have datapipes that return tuples `(img, target)` where we just want to call transformations on the img, but not the target. Sometimes it's the opposite: I want to...
### 🐛 Describe the bug After `AWSSDK` is integrated with TorchData, we now have two categories of `DataPipe`s to access and load data from AWS S3 Bucket: 1. [`DataPipe`](https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/load/fsspec.py) using...
Stack from [ghstack](https://github.com/ezyang/ghstack): * #728 * __->__ #746 By creating a deep copy of the input `ReadingService` during the initialization of `DataLoader2`, we allow the instance of `ReadingService` to store...
This PR adds support for generating archive-based datasets (tar archives, pickles, torch.save) with different binary data storage (BytesIO, tensor). It also enables training on such archive-based datasets on the torchvision...
### 🚀 The feature List the reference of state transition proposal in [the PR](https://github.com/pytorch/pytorch/pull/83535): - Initial state: `NotStarted` - `iter` -> Call `reset` and change the state to `Iterating` -...
This PR runs Nicolas' benchmark suite nightly and stores the results in a github artifact so anyone on the team can inspect them. It's also possible to trigger the benchmark...
### 📚 The doc issue There is a PR https://github.com/pytorch/pytorch/pull/82797 landed into PyTorch core, which adds the functionality to validate if the example in comment is runnable. However, in the...
### 🐛 Describe the bug # Memory increase at the start of iteration after the start I have been trying to use DataLoader2 with multiprocessing (and distributed in some cases)....
### 🐛 Describe the bug ``` dp = ... dp = dp.sharding_filter() rs = MultiProcessingReadingService(num_workers=4) dataloader = DataLoader2(dp, reading_service=rs) for _ in dataloader: pass dataloader.shutdown() ``` ``` Exception: Can not...
As of July 2023, we have paused active development on TorchData and have paused new releases. We have learnt a lot from building it and hearing from users, but also...