data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
Hello, Thank you for your awesome implementation of StatefulDataloader. I have a question about `snapshot_every_n_steps`. It seems there is not much detailed explanation about this argument. * Will frequent snapshots...
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - If you are adding a new node, ensure you read that section in the contribution guide, as...
### 🚀 The feature Add `worker_init_fn` support for `ParallelMapper` (and maybe `persistent_workers`). ### Motivation, pitch Right now, there is no way to specify a custom `worker_init_fn` for the parallel mapping...
### 🚀 The feature Do you think it would be possible to have an off the shelf component like the one described at https://github.com/Lightning-AI/pytorch-lightning/discussions/1170? ### Motivation, pitch Sometime on unbalanced...
Adding a CSV reader. It uses the built-in `csv` package to read the csv file line by line. Gives an option to skip-header and also yield items in a header...
### 🚀 The feature At some point, there were a [`InMemoryCacheHolder`](https://docs.pytorch.org/data/0.9/generated/torchdata.datapipes.iter.InMemoryCacheHolder.html?highlight=cache#torchdata.datapipes.iter.InMemoryCacheHolder) datapipe. However, this has been removed from the new node design. This would be very useful for some expensive...
my team has been implementing quite several utilities. some are close to core features, some other are more advanced and utilities. for example, their class names and features are like:...
### 🐛 Describe the bug import torchdata.datapipes.iter.transform.bucketbatcher as b print(b.__file__) --------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In[1], line 1 ----> 1 import torchdata.datapipes.iter.transform.bucketbatcher as b 2 print(b.__file__) File...
Currently the StatefulDistributedSampler takes the dataset as an argument, but only uses the length/size of the dataset. Adding an option to provide size of the dataset instead, for more flexibility.
### 🚀 The feature Feature request: expose a `pin_memory_map` parameter in the PinMemory node, which defaults to the current choice (pin_memory from torch): ```py from torch.utils.data._utils.pin_memory import pin_memory class PinMemory(BaseNode[T]):...