data issues

[RFC] Restricting `IterDataPipe` to have method `iter` as a generator function without method `next`

### 🚀 The feature ** Note that this is a RFC to solely discuss the design. There is currently no plan to implement this feature. This issue serves as a...

NivekT

Functional API document auto generation

4

### 🚀 The feature Currently, we rely on the inline doc for each DataPipe to indicate the functional API. Considering `torchdata` should have already been built when doc need to...

ejguan

Second prototype to do pre-sharding work in single process

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #764 * __->__ #555

VitalyFedyunin

CLA Signed

DataLoader2 should reset when a new iterator is created?

When a new iterator is created, `DataLoader2` currently resumes from when it was left off rather than resetting and starting from the beginning again (see code snippet below). This is...

NivekT

Validation of `fn` on `input_col` for Datapipes

10

Fixes #362 ### Changes - Document input_col behavior on `FlatMapperIterDataPipe` - Checks the arguments of mapping function against `input_col` in `FlatMapperIterDataPipe`

bushshrub

CLA Signed

`data/benchmarks/`

3

### 🚀 The feature We're proposing a folder to hold all benchmark scripts which would be easily reproducible by anyone from the core PyTorch Data team, PyTorch domain teams and...

msaroufim

[Adapters] DataLoaderV2 Adapters Ideas

2

### Motivation, pitch As we are growing product there are more and more adapters ideas and requests coming up. This issue created to aggregate such requests and keep track of...

VitalyFedyunin

`len(dataloader)` in distributed setting is different with datapipes and with map-style datasets

2

In a distributed setting, `len(dataloader)` will return: - `len(dataset) // (batch_size * num_GPUs)` if `dataset` is a map-style dataset - `len(dataset) // batch_size` if `dataset` is a datapipe This discrepancy...

NicolasHug

`in_batch_shuffle` currently doesn't use shared RNG

Currently, `in_batch_shuffle` doesn't use the RNG shared across processes. This can be problematic if it is used before prior to sharding, resulting in certain samples not being used in training...

NivekT

[RFC] Make the TorchData Library Standalone from PyTorch Core Library

16

### 🚀 The feature **Note that this is a request for comment; currently, there is no plan to make TorchData a standalone library.** We would like to solicit feedback from...

NivekT

Better Engineering

data
data copied to clipboard

Metadata

[RFC] Restricting `IterDataPipe` to have method `iter` as a generator function without method `next`

Functional API document auto generation

Second prototype to do pre-sharding work in single process

DataLoader2 should reset when a new iterator is created?

Validation of `fn` on `input_col` for Datapipes

`data/benchmarks/`

[Adapters] DataLoaderV2 Adapters Ideas

`len(dataloader)` in distributed setting is different with datapipes and with map-style datasets

`in_batch_shuffle` currently doesn't use shared RNG

[RFC] Make the TorchData Library Standalone from PyTorch Core Library

← Metadata

Owner

Metadata

data data copied to clipboard

Metadata

← Metadata

Owner

Metadata

data
data copied to clipboard