data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

### 🐛 Describe the bug When an S3 object name matches the relative path of a local file, the file's contents get cleared after loading the object data. ```python import...

Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes #656 ###...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #734 This PR is primarily focused on adding more datasets for benchmarking. Notable changes that are in progress: - Using `PrototypeMultiprocessingReadingService` as that will become...

CLA Signed

Say I have a bunch of archives containing samples. In my case each archive is a pickle file containing a list of samples, but it could be a tar or...

The `Concater` datapipe takes multiple DPs as input. Is there a class that would take a **single** datapipe of iterables instead? Something like this: ```py class ConcaterIterable(IterDataPipe): def __init__(self, source_datapipe):...

good first issue

Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #724 This PR adds RandomSplitter without a buffer. The upside is that this uses less memory (good for memory-bound cases) but the downside are 1)...

CLA Signed
topic: new feature

Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #728 * #746 Adding `__len__` to `DataLoader2`. See inline comments. We should discuss the details and if this makes sense. Fixes #549 Differential Revision: [D38999743](https://our.internmc.facebook.com/intern/diff/D38999743)

CLA Signed
topic: improvements

### 🚀 The feature - [x] Add ability to drop specific column / field For `list` and `tuple` ```python list(dp) # [ (0, 1, 2), (3, 4, 5), (6,7, 8)...

good first issue

### 📚 The doc issue There are a few common steps that users often would like to do while preprocessing data, such as [splitting their data set](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split) into train and...

documentation