data icon indicating copy to clipboard operation
data copied to clipboard

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Results 302 data issues
Sort by recently updated
recently updated
newest added

### 🚀 The feature ```python class MultiProcessingReadingService(ReadingServiceInterface): num_workers: int = get_number_of_cpu_cores() pin_memory: bool = True timeout: float worker_init_fn: Optional[Callable[[int], None]] # Remove this? prefetch_factor: int = profile_optimal_prefetch_factor(model : nn.Module) persistent_workers:...

enhancement

### 🐛 Describe the bug I create some parquet files with the following: ``` def save_tensor(t): buf = io.BytesIO() th.save(t, buf) return buf.getvalue() ``` ``` for idx, batch in enumerate(tqdm(dl,...

Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes #416 ###...

CLA Signed

### 🐛 Describe the bug Hi, When decoding from a file stream in `StreamReader`, torchdata automatically assumes the incoming bytes are UTF-8. However, in the case of alternate encoding's this...

### 📚 The doc issue I am not sure how to implement distributed training. ### Suggest a potential alternative/fix If there was a simple example that showed how to use...

documentation

This issue is generated from the TODO line https://github.com/pytorch/data/blob/2f29adba451e1b87f1c0c654557d9dd98673fdd8/./examples/criteo_torcharrow/torcharrow_wrapper.py#L1

Better Engineering

This issue is generated from the TODO line https://github.com/pytorch/data/blob/2f29adba451e1b87f1c0c654557d9dd98673fdd8/./torchdata/dataloader2/reading_service.py#L135 cc @VitalyFedyunin

good first issue

### 🐛 Describe the bug Let `t` be an input dataset that associates strings (model input) to integers (model output): ```python t = [("a", 567), ("b", 908), ("c", 887)] ```...

### 🚀 The feature A dataloader which communicates with its workers via torch.distributed.rpc API. ### Motivation, pitch Presently, process-based workers for Dataloader mean the workers live on the same server/PC...

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #652 * #562 Prototype version to trigger tests.

CLA Signed