data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 🐛 Describe the bug When an S3 object name matches the relative path of a local file, the file's contents get cleared after loading the object data. ```python import...
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes #656 ###...
Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #734 This PR is primarily focused on adding more datasets for benchmarking. Notable changes that are in progress: - Using `PrototypeMultiprocessingReadingService` as that will become...
Say I have a bunch of archives containing samples. In my case each archive is a pickle file containing a list of samples, but it could be a tar or...
The `Concater` datapipe takes multiple DPs as input. Is there a class that would take a **single** datapipe of iterables instead? Something like this: ```py class ConcaterIterable(IterDataPipe): def __init__(self, source_datapipe):...
Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #724 This PR adds RandomSplitter without a buffer. The upside is that this uses less memory (good for memory-bound cases) but the downside are 1)...
Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #728 * #746 Adding `__len__` to `DataLoader2`. See inline comments. We should discuss the details and if this makes sense. Fixes #549 Differential Revision: [D38999743](https://our.internmc.facebook.com/intern/diff/D38999743)
### 🚀 The feature - [x] Add ability to drop specific column / field For `list` and `tuple` ```python list(dp) # [ (0, 1, 2), (3, 4, 5), (6,7, 8)...
Add Examples of Common Preprocessing Steps with IterDataPipe (such as splitting a data set into two)
### 📚 The doc issue There are a few common steps that users often would like to do while preprocessing data, such as [splitting their data set](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split) into train and...