Erjia Guan
Erjia Guan
So, I guess we need to figure out a way to let users to indicate when they have done with `MapDataPipe` then deleting/depleting the iterator of prior DataPipe (it would...
Are you running multiple DPP at the same time?
`replicable` means the `DataPipe` can be copied multiple times for multiprocessing workers. If it's not, it will be either kept in a dispatching process when `ShardingRoundRobinDispatcher` is used or kept...
The problem with `ShardingRoundRobinDispatcher` is that it currently only supports `SHARDING_PRIORITY.MULTIPROCESSING`.
Here are a few things in my mind to help users easily find this problem: - First, add explicit documentation about it and add instruction to use `weakref` to wrap...
A little bit context on old DataLoader. It always tries to collate samples into Tensor via collate_fn. Therefore, it would help reduce overhead of transmitting samples from worker process to...
> This resulted in a degradation of the performance to the single-threaded case which lets me believe that my main performance overhead right now is actually the `collate`. I am...
Related to https://github.com/pytorch/pytorch/issues/96975 We should allow users to provide custom sharding DataPipe. Will send a PR shortly.
> Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time? Correct. This is inspired by `tensordict` to help accelerating MP.
@NivekT Could you please change the colab link to the new one?