data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

Implements rescaling of checkpoints to different world sizes and numbers of workers. User specifies in advance the number of data partitions, and when saving/loading checkpoints with different total workers, stateful...

CLA Signed

After much discussion, it was decided that the best approach to implementing rescalability would be to implement rescaling in the base file reader, in order to maintain low overhead and...

What's the best way to handle multi-gpu and multi-node training with torchdata nodes? @keunwoochoi and I have the following nodes: * `FileListNode` which lists all files (shards) * `StreamNode` which...

hi, thanks for reviving torchdata. i was able to move on to `0.10.1` for lots of my existing datapipes. it seems to work pretty nicely. question - am i supposed...

### 🐛 Describe the bug I am wrapping an `sqlite3` cursor inside an `IterableWrapper` for further processing. ```python import sqlite3 from torchdata.nodes import IterableWrapper, Loader, ParallelMapper db = sqlite3.connect(':memory:') db.execute('create...

Some nodes like pin memory require GPU for unit tests. We might be skipping those during CI. This can lead to undetected bugs getting landed.

### 🐛 Describe the bug Repro ```from torchdata.nodes import IterableWrapper, ParallelMapper, Loader, Batcher node = IterableWrapper(range(10)) node = ParallelMapper(node, map_fn=lambda x: x**2, num_workers=3, method="thread") node = Batcher(node, batch_size=0) loader =...

### 🚀 The feature https://fburl.com/mf29fu1m ### Motivation, pitch Enable more flexible parameter for tuning, although need to keep a check on not overloading memory and system. ### Alternatives _No response_...

Hi, do you think this kind of nodes would be in the scope of Torchdata? Then I'm down to open a PR to add them. with remaining and testing, for...

At `0.10.1`, `torchdata.nodes.Mapper` is a function, not a class. [code](https://github.com/pytorch/data/blob/main/torchdata/nodes/map.py#L41C1-L55C1) ```python def Mapper(source: BaseNode[X], map_fn: Callable[[X], T]) -> "ParallelMapper[T]": """Returns a :class:`ParallelMapper` node with num_workers=0, which will execute map_fn in...