data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

### 🐛 Describe the bug There are a couple of places that `DataLoader2` uses `IterDataPipe` as the type hint ([here](https://github.com/pytorch/data/blob/12cfaf8899b1337981cd4edf9deef127f925f1bd/torchdata/dataloader2/dataloader2.py#L22), [here](https://github.com/pytorch/data/blob/12cfaf8899b1337981cd4edf9deef127f925f1bd/torchdata/dataloader2/reading_service.py#L17), etc.) We should add a type called `DataPipe =...

Our current strategy is to wrap all file handles in a [`StreamWrapper`](https://github.com/pytorch/pytorch/blob/88fca3be5924dd089235c72e651f3709e18f76b8/torch/utils/data/datapipes/utils/common.py#L154). It dispatches all calls to wrapped object and adds a `__del__` method: ```py class StreamWrapper: def __init__(self, file_obj):...

### 📚 The doc issue Hi! I'm the author of a python library which is called [SeqTools](https://github.com/nlgranger/SeqTools). It predates torchdata and provides essentially the same functionality as `MapDataPipes`. I just...

### 🚀 The feature Implement a `distributed_sharding_filter` that would behave similar to `sharding_filter` (https://github.com/pytorch/pytorch/blob/3f140c5b32fa8685cc7a10bdb94f3f8b127e3a92/torch/utils/data/datapipes/iter/grouping.py), but would filter according to global rank and world size if torch.distributed was initialized. If torch.distributed...

The `buffer_size` parameter is currently fairly inconsistent across datapipes: | name | default `buffer_size` | infinite `buffer_size` | warn on infinite | |--------------------|-------------------------|--------------------------|--------------------| | Demultiplexer | 1e3 | -1 |...

Better Engineering

## 🚀 Feature We could support different DataPipe with same functionality using a same functional API with a router datapipe. Like `open`, we can support: - URL - IoPath -...

research feature

This new data API looks great and has many similarities with the [DataBlock](https://docs.fast.ai/tutorial.datablock.html) api from fastai. We have a [discord](https://discord.gg/Yy82YcR4) channel for fastai dev and we would love to help/test/integrate...

### 🚀 The feature, motivation and pitch Occasionally one might find that their GPU is idle due to a bottleneck on the input data pre-processing pipeline (which might include data...

feature
module: dataloader
triaged
module: data

### 🚀 The feature Allows for sourcing datasets from SQL queries. Should allow to substitute in different backends. eg Athena, Presto, Postgres. Should do smart batching to minimize number of...

### 🚀 The feature Instead of runtime we can detect CUDA context fork issues at reading service initiation time and point users to the code mistake. ### Motivation, pitch Lots...