data issues

best practice for `snapshot_every_n_steps`

5

Hello, Thank you for your awesome implementation of StatefulDataloader. I have a question about `snapshot_every_n_steps`. It seems there is not much detailed explanation about this argument. * Will frequent snapshots...

ShoufaChen

documentation

draft: add Shuffler

Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - If you are adding a new node, ensure you read that section in the contribution guide, as...

keunwoochoi

CLA Signed

ParallelMapper missing a `worker_init_fn`

1

### 🚀 The feature Add `worker_init_fn` support for `ParallelMapper` (and maybe `persistent_workers`). ### Motivation, pitch Right now, there is no way to specify a custom `worker_init_fn` for the parallel mapping...

alanhdu

online hard-example mining under Multi-GPU (DDP)

### 🚀 The feature Do you think it would be possible to have an off the shelf component like the one described at https://github.com/Lightning-AI/pytorch-lightning/discussions/1170? ### Motivation, pitch Sometime on unbalanced...

bhack

Add csv_reader

1

Adding a CSV reader. It uses the built-in `csv` package to read the csv file line by line. Gives an option to skip-header and also yield items in a header...

ramanishsingh

CLA Signed

Implement a Cache node

1

### 🚀 The feature At some point, there were a [`InMemoryCacheHolder`](https://docs.pytorch.org/data/0.9/generated/torchdata.datapipes.iter.InMemoryCacheHolder.html?highlight=cache#torchdata.datapipes.iter.InMemoryCacheHolder) datapipe. However, this has been removed from the new node design. This would be very useful for some expensive...

leleogere

torchdata or torchdata-contrib?

2

my team has been implementing quite several utilities. some are close to core features, some other are more advanced and utilities. for example, their class names and features are like:...

keunwoochoi

Unable to add DataPipe function name list_files_by_ais_custom as it is already taken

1

### 🐛 Describe the bug import torchdata.datapipes.iter.transform.bucketbatcher as b print(b.__file__) --------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In[1], line 1 ----> 1 import torchdata.datapipes.iter.transform.bucketbatcher as b 2 print(b.__file__) File...

zhang0730

Give an option to either provide dataset or dataset_size in distributed sampler

1

Currently the StatefulDistributedSampler takes the dataset as an argument, but only uses the length/size of the dataset. Adding an option to provide size of the dataset instead, for more flexibility.

ramanishsingh

CLA Signed

Expose pin_memory callable for custom objects

1

### 🚀 The feature Feature request: expose a `pin_memory_map` parameter in the PinMemory node, which defaults to the current choice (pin_memory from torch): ```py from torch.utils.data._utils.pin_memory import pin_memory class PinMemory(BaseNode[T]):...

mirceamironenco

data
data copied to clipboard

Metadata

best practice for `snapshot_every_n_steps`

draft: add Shuffler

ParallelMapper missing a `worker_init_fn`

online hard-example mining under Multi-GPU (DDP)

Add csv_reader

Implement a Cache node

torchdata or torchdata-contrib?

Unable to add DataPipe function name list_files_by_ais_custom as it is already taken

Give an option to either provide dataset or dataset_size in distributed sampler

Expose pin_memory callable for custom objects

← Metadata

Owner

Metadata

data data copied to clipboard

Metadata

← Metadata

Owner

Metadata

data
data copied to clipboard