data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 🚀 The feature We cut a [release branch](https://github.com/pytorch/data/tree/release/0.5) for the 0.5.0 release at the same time with PyTorch. ## Cherry-pick (until TBD) We will keep cherry-picking all PRs landed...
[DataLoader2] Skip wrapping if serialization wrapper attached & deepcopy DataPipe in load_state_dict
Per title - We don't need to attach serialization wrapper if the last DataPipe has been `_DataPipeSerializationWrapper` - When we `load_state_dict`, we still need a copy of `self.datapipe` as it...
### 🐛 Describe the bug Running the code snippet below in MacOS will print the correct values but will not automatically terminate afterwards without manual interruption. I don't think it...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #824 * __->__ #823 fixes: #806 Differential Revision: [D40280665](https://our.internmc.facebook.com/intern/diff/D40280665)
According to the document [MaxTokenBucketizer](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.MaxTokenBucketizer.html#torchdata.datapipes.iter.MaxTokenBucketizer) buffer_size – This restricts how many **tokens** are taken from prior DataPipe to bucketize However, in the code, [bucketbatcher.py#L277](https://github.com/pytorch/data/blob/84587ff57575fd47fcae61635a3f4ffc1e639941/torchdata/datapipes/iter/transform/bucketbatcher.py#L277) The unit of buffer_size is **sample**...
This PR temporarily extend `PrototypingMultiProcessingReadingService` to fully control the determinism of the pipeline in the combinations of: - Single/Multi-processing - Distributed/Non-distributed When we have `SequentialReadingService` ready to combine `DistributedReadingService` and...
### 🚀 The feature Would TorchData provide GPU support for loading and preprocessing images? ### Motivation, pitch When I am learning PyTorch, I find, currently, it do not support using...
Fixes: #815 Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #826 Fixes #825 Differential Revision: [D40308358](https://our.internmc.facebook.com/intern/diff/D40308358)
### 🐛 Describe the bug Vision team reported this behavior on Linux ```python dp = ... # something with prefetch dl = DataLoader2(reading_service=PrototypeMultiProcessingReadingService()) for _ in dl: # this works...