Erjia Guan comments

Results 170 comments of


Erjia Guan

Ability to manipulate columns and fields

> should we (my recommendation) keep that and add a new datapipe that filters each element in the slicing or indexing way described. It sounds like a reasonable design to...

Add support for sharding filter in distributed settings

Thanks for asking it. We understand this need and we are working on it. We are currently working on DataLoader2 to handle dynamic sharding using the `sharding_filter` for both MP...

Add support for sharding filter in distributed settings

No need to close it. We will keep you updated when this feature is landed.

Add support for sharding filter in distributed settings

Just want to update here. If you have `sharding_filter` in your pipeline, the `DataPipe` graph should be dynamically sharded using `DataLoader`. You can either use nightly release of PyTorch Core...

running FSSpecFileLister in ikernel doesn't work

Thanks for spotting the issue. Investigating it.

running FSSpecFileLister in ikernel doesn't work

I have used the same script with a public dataset on GCS, but I can't reproduce this issue. Could you please try to re-install `torchdata` using the nightly release? `pip3...

running FSSpecFileLister in ikernel doesn't work

For the `token` argument, we added `kwargs` to `FSSpecFileLister`. With TorchData 0.4.0 or nightly release, you should be able to add your token there: https://github.com/pytorch/data/blob/f1a128ec789f078852943e8c58377a99b42a7b45/torchdata/datapipes/iter/load/fsspec.py#L57 Based on the discussion on...

Erjia Guan

Ability to manipulate columns and fields

Add support for sharding filter in distributed settings

Add support for sharding filter in distributed settings

Add support for sharding filter in distributed settings

running FSSpecFileLister in ikernel doesn't work

running FSSpecFileLister in ikernel doesn't work

running FSSpecFileLister in ikernel doesn't work

Suggestion: Dataloader with RPC-based workers

speed benchmark: `IterDataPipe` noticeably slower than `MapDataPipe`

speed benchmark: `IterDataPipe` noticeably slower than `MapDataPipe`