Gokul
Gokul
Summary: Use the stateful_dataloader from torchdata (https://github.com/pytorch/data/tree/main/torchdata/stateful_dataloader) for storing the token buffer and iteration data order. It requires a dependency on the nightly build of torchdata >= 20240426. Also make...
### 🚀 The feature title ### Motivation, pitch rstrip/lstrip can lead to unintended behaviors ### Alternatives _No response_ ### Additional context _No response_
Adding deprecation message about S3 IterDataPipes and pointing to https://github.com/awslabs/s3-connector-for-pytorch. Cc @jamesbornholt, @dnauti
JIRA : https://issues.apache.org/jira/browse/TEPHRA-265
Currently the user has to pip install the nightly explicitly. Instead once torchdata is released with StatefulDataLoader, add it to requirements.txt and pyproject.toml and remove the pip install from added...
Summary: Allows users to get the worker states out of the state dict. It can be used if users want to modify the state offline. Starts off with very preliminary...
### 🐛 Describe the bug Consider the following code: ``` class DatasetStateIterable(torch.utils.data.IterableDataset, Stateful): def __init__(self, length): self.length = length def __iter__(self): return iter(list(range(self.length))) def state_dict(self): print("Calling state dict") return {"key":...
### 🚀 The feature Currently RandomSampler, BatchSampler are patched here https://github.com/pytorch/data/blob/main/torchdata/stateful_dataloader/sampler.py#L134-L135 to make them stateful and work out of the box with StatefulDataLoader. It would be useful to consider making...