Erjia Guan

Results 170 comments of Erjia Guan

> I don't see a label for `torch text`. @ejguan what label do you think this should go under? @bdhirsh We should transfer this issue to TorchText repo.

@parmeet Even though the root cause of this Error is unknown to me, do you think we could align the Error between two versions of TorchText? These OnlineReader could take...

> I am not exactly sure why [this error message](https://github.com/pytorch/text/blob/38f520cd0293308e86ef0b2f2adc2f180dc3bab9/torchtext/_download_hooks.py#L35-L36) is removed from the implementation in GDriveReader [here](https://github.com/pytorch/data/blob/c1d89fe9a1b06e610f32f823359771557b1ca12a/torchdata/datapipes/iter/load/online.py#L77-L79) when `confirm_token` is `None`? Can't find why via git blame as the...

> > > > do you think we could align the Error between two versions of TorchText? > > > > > > I think one way to achieve this...

It's doable using `MapDataPipe` but it's a different concept. And, it's currently the second citizen for TorchData as we are recommending using `IterDataPipe` in favor of streaming especially for the...

@ParsaAkbari You might need to install `torchdata` manually by downloading and running `python setup.py develop` since `torchdata` has not provided binaries for m1.

@ParsaAkbari The binaries for arm64 becomes available for nightly release. You can get them via: `conda install pytorch torchdata -c pytorch-nightly.` And, for the coming torchdata 0.4.1 minor releases, you...

it depends on how you want to split. For a simple case, you can use `demux` to split based on the indices generated by enumerating from the prior DataPipe.

We do have a tracking issue here: https://github.com/pytorch/data/issues/457 But, based on my understanding, we might not have the bandwidth for now to implement it.

IMHO, I think this is a great example to use modular DataPipes. It could take one sequence of DataPipes to read metadata and connect it (with some routers) to sequences...