data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
See comments in https://github.com/pytorch/data/pull/780 @pmeier At list for the first Error, the proper typing should be: ```py def load(path: pathlib.Path) -> IterDataPipe[Tuple[str, BinaryIO]]: if path.is_dir(): dp: IterDataPipe = FileLister(str(path), recursive=True)...
### 🐛 Describe the bug I get the following errors. ```python from torchdata.datapipes.map import SequenceWrapper tuple(SequenceWrapper({'a': 100, 'b': 200, 'c': 300, 'd': 400})) # produces KeyError: 0 tuple(SequenceWrapper({0: "100", 1:...
### 🐛 Describe the bug When using an IterDataPipe in their functional form as recommended in https://pytorch.org/data/0.4/torchdata.datapipes.iter.html#torchdata.datapipes.iter.IterDataPipe, users have to write something like `dp.batch(...)`. However, when looking at the docstring...
### 🐛 Describe the bug I've been playing around with `torchdata` as a replacement for the `webdataset` library. My main use-case is reading data from network-attached file systems (such as...
This PR adds a ShardExpander filter, a filter that will take shard specs of the form "prefix-{000..999}.tar" and expand them into the 1000 corresponding file names, like the shell brace...
### 🚀 The feature Support a file mask on `list_files_by_s3` just like `list_files` does. ### Motivation, pitch ```python dp.list_files_by_s3(masks="*.csv") ``` is nicer than ```python dp.list_files_by_s3().filter(filter_fn=lambda filename: filename.endswith(".csv")) ``` ### Alternatives...
### 🐛 Describe the bug The `.on_disk_cache()` data pipe uses `.demux()` under the hood with a default `buffer_size` of 1000. Unfortunately this appears to break when the source datapipe has...
### 🚀 The feature From what I can tell torchdata actually had type stubs but mypy won't pick it up because it is missing the `py.typed` marker file (see: https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker)...
We have a number of DataPipes that are being deprecated. Our general policy is that we first mark the DataPipe as deprecated with a warning, and wait at least one...