data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 🚀 The feature Standard utility to chain function calls, similar to torchvision's Compose class ### Motivation, pitch This is going to be really useful for folks that want to...
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - If you are adding a new node, ensure you read that section in the contribution guide, as...
Since `_IterableDatasetFetcher` has no state attribute: https://github.com/pytorch/pytorch/blob/v2.6.0/torch/utils/data/_utils/fetch.py#L19, and the current `fetcher_state:dataset_iter_state` is None: https://github.com/meta-pytorch/data/blob/v0.11.0/torchdata/stateful_dataloader/worker.py#L277, could this cause prefetched data to be discarded during resume?
### 🚀 The feature Add support for Creative Commons No-AI-Training license flags ### Motivation, pitch Hello! Creative Commons is introducing "preference signal" licenses, an addition to CC licenses that indicates...
This is a specialized file opener + decoder that - works for various types of sources (s3, gcs, local path) - open any text file and stream the content line...
Hi! Is there a timeline for the next stable release?
### 🚀 The feature Being able to precisely recover the state of data loading is a popular feature. Would be great to have it in core to increase visibility of...
### 🐛 Describe the bug ## Description When using `enumerate()` with a StatefulDataLoader that has been restored from a saved state, the enumeration index always starts from 0, even though...
### 🚀 The feature A `torchdata.nodes.Stack` node that would use multiple `torch.utils.data.Sampler` and maybe `torchdata.nodes.Batcher` instances to generate independent batches from the same dataset and stack them. A naive implementation...
Adding a custom enumerate method to track batch number in SDL Fixes #1503