data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

See comments in https://github.com/pytorch/data/pull/780 @pmeier At list for the first Error, the proper typing should be: ```py def load(path: pathlib.Path) -> IterDataPipe[Tuple[str, BinaryIO]]: if path.is_dir(): dp: IterDataPipe = FileLister(str(path), recursive=True)...

Better Engineering

### 🐛 Describe the bug I get the following errors. ```python from torchdata.datapipes.map import SequenceWrapper tuple(SequenceWrapper({'a': 100, 'b': 200, 'c': 300, 'd': 400})) # produces KeyError: 0 tuple(SequenceWrapper({0: "100", 1:...

### 🐛 Describe the bug When using an IterDataPipe in their functional form as recommended in https://pytorch.org/data/0.4/torchdata.datapipes.iter.html#torchdata.datapipes.iter.IterDataPipe, users have to write something like `dp.batch(...)`. However, when looking at the docstring...

### 🐛 Describe the bug I've been playing around with `torchdata` as a replacement for the `webdataset` library. My main use-case is reading data from network-attached file systems (such as...

This PR adds a ShardExpander filter, a filter that will take shard specs of the form "prefix-{000..999}.tar" and expand them into the 1000 corresponding file names, like the shell brace...

CLA Signed

### 🚀 The feature Support a file mask on `list_files_by_s3` just like `list_files` does. ### Motivation, pitch ```python dp.list_files_by_s3(masks="*.csv") ``` is nicer than ```python dp.list_files_by_s3().filter(filter_fn=lambda filename: filename.endswith(".csv")) ``` ### Alternatives...

enhancement
good first issue

### 🐛 Describe the bug The `.on_disk_cache()` data pipe uses `.demux()` under the hood with a default `buffer_size` of 1000. Unfortunately this appears to break when the source datapipe has...

### 🚀 The feature From what I can tell torchdata actually had type stubs but mypy won't pick it up because it is missing the `py.typed` marker file (see: https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker)...

We have a number of DataPipes that are being deprecated. Our general policy is that we first mark the DataPipe as deprecated with a warning, and wait at least one...