data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### π The doc issue This doesn't seem to be mentioned in the docs, but if you have two datapipes that use `sharding_round_robin_dispatcher` and then `mux` them together: 1. Any...
### π Describe the bug If either a worker process or the feeder process of the MPRS get killed, the main process will just hang indefinitely and not throw an...
### π The feature In MPI-based training, each process is independent from each other. Each training process might want to speed up dataloading using multiprocessing (MP). This requires data sharding...
### π The feature 1. Store usage statistics in `Prefetcher` - By tracking statistics within `Prefetcher`, we can reasonably determine whether upstream processes or downstream processes are faster. For example,...
### π The doc issue In the [ReadingService docs](https://pytorch.org/data/main/reading_service.html?highlight=replicable) the different sharding options and that one applies to replicable and one to non-replicable datapipes, but it's not really explained what...
### π Describe the bug Iβm trying to parse a single csv file that is zipped and stored in aws s3, but getting the following error: ``` Exception when executing...
### π Describe the bug The following, in my opinion valid, snippet fails with ``` import torchdata.datapipes as dp from torch.utils.data.datapipes.iter.sharding import SHARDING_PRIORITIES from torchdata.dataloader2 import MultiProcessingReadingService, DataLoader2 pipe =...
### π The feature https://github.com/pytorch/pytorch/blob/master/torch/utils/data/graph_settings.py#L51 currently explicitely checks for `_ShardingIterDataPipe` which is 1. a private type, and 2, not in line with e.g. how `apply_shuffle_settings` works (checking for presence of...
### π Describe the bug When cycle with later caching is used it works the very first time but afterwards it crashes because of the demux because it checks infinitely...
### π Describe the bug MPRS version: ```python train_dp = IterableWrapper(train_ds).batch(batch_size=batch_size).collate(collate_fn=encode_processor) rs = MultiProcessingReadingService(num_workers=num_workers) train_dl = DataLoader2(train_dp, reading_service=rs) for batch_idx, batch in enumerate(train_dl): print(batch_idx) if batch_idx > 500: break ```...