data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

### 🚀 The feature ** Note that this is a RFC to solely discuss the design. There is currently no plan to implement this feature. This issue serves as a...

### 🚀 The feature Currently, we rely on the inline doc for each DataPipe to indicate the functional API. Considering `torchdata` should have already been built when doc need to...

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #764 * __->__ #555

CLA Signed

When a new iterator is created, `DataLoader2` currently resumes from when it was left off rather than resetting and starting from the beginning again (see code snippet below). This is...

Fixes #362 ### Changes - Document input_col behavior on `FlatMapperIterDataPipe` - Checks the arguments of mapping function against `input_col` in `FlatMapperIterDataPipe`

CLA Signed

### 🚀 The feature We're proposing a folder to hold all benchmark scripts which would be easily reproducible by anyone from the core PyTorch Data team, PyTorch domain teams and...

### Motivation, pitch As we are growing product there are more and more adapters ideas and requests coming up. This issue created to aggregate such requests and keep track of...

In a distributed setting, `len(dataloader)` will return: - `len(dataset) // (batch_size * num_GPUs)` if `dataset` is a map-style dataset - `len(dataset) // batch_size` if `dataset` is a datapipe This discrepancy...

Currently, `in_batch_shuffle` doesn't use the RNG shared across processes. This can be problematic if it is used before prior to sharding, resulting in certain samples not being used in training...

### 🚀 The feature **Note that this is a request for comment; currently, there is no plan to make TorchData a standalone library.** We would like to solicit feedback from...

Better Engineering