Erjia Guan

Results 170 comments of Erjia Guan

> should we (my recommendation) keep that and add a new datapipe that filters each element in the slicing or indexing way described. It sounds like a reasonable design to...

Thanks for asking it. We understand this need and we are working on it. We are currently working on DataLoader2 to handle dynamic sharding using the `sharding_filter` for both MP...

No need to close it. We will keep you updated when this feature is landed.

Just want to update here. If you have `sharding_filter` in your pipeline, the `DataPipe` graph should be dynamically sharded using `DataLoader`. You can either use nightly release of PyTorch Core...

Thanks for spotting the issue. Investigating it.

I have used the same script with a public dataset on GCS, but I can't reproduce this issue. Could you please try to re-install `torchdata` using the nightly release? `pip3...

For the `token` argument, we added `kwargs` to `FSSpecFileLister`. With TorchData 0.4.0 or nightly release, you should be able to add your token there: https://github.com/pytorch/data/blob/f1a128ec789f078852943e8c58377a99b42a7b45/torchdata/datapipes/iter/load/fsspec.py#L57 Based on the discussion on...

Thanks for adding the issue. IIRC, RPS is used to communicate between distributed processes. We are working on `ReadingService` with `DataLoader2`, it seems `RPC` would be an good option as...

As @NivekT pointed out the overhead of profiler is significantly huge, potentially this PR would solve the performance regression for `IterDataPipe`: https://github.com/pytorch/pytorch/pull/78674

Since the PR is landed, do you want to try to test nightly release of PyTorch and TorchData. You can install torchdata via `pip install –pre torch torchdata -f https://download.pytorch.org/whl/nightly/cpu`