data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

### 🐛 Describe the bug `pipe.fork(n)` returns the original pipe when `n == 1` and not a list with 1 element. This introduces two issues: 1. It's unexpected to the...

### 🚀 The feature For `IterDataPipe`, the `.map` maps a function over the items of an iterable. where the function has the form ``` f: Any -> Any ``` Other...

### 🚀 The feature Add support to `Decompressor` for `file_type="zlib"`, and add a corresponding `ZlibFileLoader`. ### Motivation, pitch zlib is a very popular format, but `Decompressor` doesn't seem to support...

### 🐛 Describe the bug When running prefetch on multiple branches of a forked datapipe, it is possible to trigger a race condition. ```python import time import torchdata.datapipes as dp...

### 🚀 The feature A few improvements can be achieved for AsyncIODataPipe: - Make it working properly with `nested_async` - Constantly fetching rather than fetching batch per batch `with closing(nested_async.prefetch_sequence(datapipe,...

### 🐛 Describe the bug There are two issues (both are reproducible using the script below): 1. `FSSpecFileOpenerIterDataPipe` gets stuck if one tries to iteratively create `DataLoader(num_workers=0, ...)` then `DataLoader(num_workers=greater_than_zero)`....

These builds are working as of the second-last commit of this PR: * Conda: https://github.com/pytorch/data/actions/runs/4758538815/jobs/8456724458?pr=1129

CLA Signed

### 🐛 Describe the bug Dataloader2 returned DataChunk, and can not be moved to device. ``` def TFRLoader(path): record_pipe = FileLister(path) file_pipe = FileOpener(record_pipe, mode="b") return file_pipe.load_from_tfrecord().map(tfrecord_praser).batch(batch_size) rs = MultiProcessingReadingService(num_workers=cfg.num_workers)...

### 🐛 Describe the bug ``` TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must...

### 🐛 Describe the bug After https://github.com/pytorch/data/pull/827 is landed, our nightly release becomes broken for distributed test on windows. It basically will time out on windows distributed testing. Needs to...