data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

### 🚀 The feature When transmitting `np.ndarray` via `torch.multiprocessing` (e.g. used by MPRS), back them by shared memory (SM) to significantly speed up transmission. This could potentially be expanded to...

### 🐛 Describe the bug This took me extremely long to figure out and is a super sneaky bug: ``` import os import functools from torchdata.datapipes.iter import IterableWrapper, IterDataPipe from...

### 🐛 Describe the bug ``` dp = dp.sharding_round_robin_dispatch() ``` results in ``` def __init__(self, source_datapipe: IterDataPipe, sharding_group_filter: Optional[SHARDING_PRIORITIES] = None): self.source_datapipe = source_datapipe if sharding_group_filter != SHARDING_PRIORITIES.MULTIPROCESSING: > raise...

### 🐛 Describe the bug Hi, here we are training in webdataset format with torchdata. Everything else works fine without fullsync, just the training hangs at the last iteration. So...

## 🚀 Feature Currently we used `import` in each DataPipe class to make lazy importing happens like https://github.com/pytorch/data/blob/4802a350d1bf954d0785e0f22fd1fcde2deded76/torchdata/datapipes/iter/load/iopath.py#L21-L29 As more potential libraries used in TorchData to support different functionalities, we...

Better Engineering

### 🚀 The feature Mitigate false positive test failures related to Google Drive. Some possible solutions are: 1. Move the files to a location that is still on GDrive but...

Better Engineering

While working on #174, I also worked on the test suite. In there we have the ginormous tests that are hard to parse, because they do so many things at...

Better Engineering

### 🐛 Describe the bug My forked repo of torchdata have been running the cron job to validate nightly binaries. See workflow https://github.com/ejguan/data/actions/runs/4097726223 @atalman Is this expected? Can we disable...

Better Engineering

### 🐛 Describe the bug For all torchdata wheels, in order to provide the integration of AWSSDK, we statically linked `OpenSSL` to `_torchdata.so`, we should prevent any symbol from OpenSSL...

Better Engineering

### 🚀 The feature `filelock` module is added as a dependency for `PyTorch` when dynamo is introduced. https://github.com/pytorch/pytorch/blob/a0d1dbc4466121f7aa0fa3754c5fbfe679c70c8e/requirements.txt#LL13-L13C9 We can re-use `filelock` module to do all file lock rather than...

Better Engineering