Gokul

Results 9 issues of Gokul

Summary: Use the stateful_dataloader from torchdata (https://github.com/pytorch/data/tree/main/torchdata/stateful_dataloader) for storing the token buffer and iteration data order. It requires a dependency on the nightly build of torchdata >= 20240426. Also make...

CLA Signed

### 🚀 The feature title ### Motivation, pitch rstrip/lstrip can lead to unintended behaviors ### Alternatives _No response_ ### Additional context _No response_

Adding deprecation message about S3 IterDataPipes and pointing to https://github.com/awslabs/s3-connector-for-pytorch. Cc @jamesbornholt, @dnauti

CLA Signed

JIRA : https://issues.apache.org/jira/browse/TEPHRA-265

Currently the user has to pip install the nightly explicitly. Instead once torchdata is released with StatefulDataLoader, add it to requirements.txt and pyproject.toml and remove the pip install from added...

better_engineering

Summary: Allows users to get the worker states out of the state dict. It can be used if users want to modify the state offline. Starts off with very preliminary...

CLA Signed

### 🐛 Describe the bug Consider the following code: ``` class DatasetStateIterable(torch.utils.data.IterableDataset, Stateful): def __init__(self, length): self.length = length def __iter__(self): return iter(list(range(self.length))) def state_dict(self): print("Calling state dict") return {"key":...

### 🚀 The feature Currently RandomSampler, BatchSampler are patched here https://github.com/pytorch/data/blob/main/torchdata/stateful_dataloader/sampler.py#L134-L135 to make them stateful and work out of the box with StatefulDataLoader. It would be useful to consider making...

enhancement

title Fixes #{issue number} ### Changes - -

CLA Signed