data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 🚀 The feature ** Note that this is a RFC to solely discuss the design. There is currently no plan to implement this feature. This issue serves as a...
### 🚀 The feature Currently, we rely on the inline doc for each DataPipe to indicate the functional API. Considering `torchdata` should have already been built when doc need to...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #764 * __->__ #555
When a new iterator is created, `DataLoader2` currently resumes from when it was left off rather than resetting and starting from the beginning again (see code snippet below). This is...
Fixes #362 ### Changes - Document input_col behavior on `FlatMapperIterDataPipe` - Checks the arguments of mapping function against `input_col` in `FlatMapperIterDataPipe`
### 🚀 The feature We're proposing a folder to hold all benchmark scripts which would be easily reproducible by anyone from the core PyTorch Data team, PyTorch domain teams and...
### Motivation, pitch As we are growing product there are more and more adapters ideas and requests coming up. This issue created to aggregate such requests and keep track of...
In a distributed setting, `len(dataloader)` will return: - `len(dataset) // (batch_size * num_GPUs)` if `dataset` is a map-style dataset - `len(dataset) // batch_size` if `dataset` is a datapipe This discrepancy...
Currently, `in_batch_shuffle` doesn't use the RNG shared across processes. This can be problematic if it is used before prior to sharding, resulting in certain samples not being used in training...
### 🚀 The feature **Note that this is a request for comment; currently, there is no plan to make TorchData a standalone library.** We would like to solicit feedback from...