data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 🚀 The feature ```python class MultiProcessingReadingService(ReadingServiceInterface): num_workers: int = get_number_of_cpu_cores() pin_memory: bool = True timeout: float worker_init_fn: Optional[Callable[[int], None]] # Remove this? prefetch_factor: int = profile_optimal_prefetch_factor(model : nn.Module) persistent_workers:...
### 🐛 Describe the bug I create some parquet files with the following: ``` def save_tensor(t): buf = io.BytesIO() th.save(t, buf) return buf.getvalue() ``` ``` for idx, batch in enumerate(tqdm(dl,...
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes #416 ###...
### 🐛 Describe the bug Hi, When decoding from a file stream in `StreamReader`, torchdata automatically assumes the incoming bytes are UTF-8. However, in the case of alternate encoding's this...
### 📚 The doc issue I am not sure how to implement distributed training. ### Suggest a potential alternative/fix If there was a simple example that showed how to use...
This issue is generated from the TODO line https://github.com/pytorch/data/blob/2f29adba451e1b87f1c0c654557d9dd98673fdd8/./examples/criteo_torcharrow/torcharrow_wrapper.py#L1
This issue is generated from the TODO line https://github.com/pytorch/data/blob/2f29adba451e1b87f1c0c654557d9dd98673fdd8/./torchdata/dataloader2/reading_service.py#L135 cc @VitalyFedyunin
### 🐛 Describe the bug Let `t` be an input dataset that associates strings (model input) to integers (model output): ```python t = [("a", 567), ("b", 908), ("c", 887)] ```...
### 🚀 The feature A dataloader which communicates with its workers via torch.distributed.rpc API. ### Motivation, pitch Presently, process-based workers for Dataloader mean the workers live on the same server/PC...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #652 * #562 Prototype version to trigger tests.