Erjia Guan

Results 170 comments of Erjia Guan

So, I guess we need to figure out a way to let users to indicate when they have done with `MapDataPipe` then deleting/depleting the iterator of prior DataPipe (it would...

`replicable` means the `DataPipe` can be copied multiple times for multiprocessing workers. If it's not, it will be either kept in a dispatching process when `ShardingRoundRobinDispatcher` is used or kept...

The problem with `ShardingRoundRobinDispatcher` is that it currently only supports `SHARDING_PRIORITY.MULTIPROCESSING`.

Here are a few things in my mind to help users easily find this problem: - First, add explicit documentation about it and add instruction to use `weakref` to wrap...

A little bit context on old DataLoader. It always tries to collate samples into Tensor via collate_fn. Therefore, it would help reduce overhead of transmitting samples from worker process to...

> This resulted in a degradation of the performance to the single-threaded case which lets me believe that my main performance overhead right now is actually the `collate`. I am...

Related to https://github.com/pytorch/pytorch/issues/96975 We should allow users to provide custom sharding DataPipe. Will send a PR shortly.

> Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time? Correct. This is inspired by `tensordict` to help accelerating MP.

@NivekT Could you please change the colab link to the new one?