Erjia Guan

Results 170 comments of Erjia Guan

Just for record, when decoupled, `expecttest` can be removed from our test dependency.

> The problem is more general in that it applies to any kind of wrapper that generates a datapipe, not just to a `Mapper`. For example the solution below still...

> Looks like `in_batch_shuffle` would be a good alternative, however I do need the ability to disable shuffling because the same code-path would be used for the training sets (shuffle...

This might be one OSS story integrating both TorchArrow and TorchData, where TorchArrow handles DataFrame transformation and forward to backend SQL engine cc: @wenleix @VitalyFedyunin

There is another problem with the option 1 (even though IMO the syntax seems the cleanest): - DL linter would be a runtime linter rather than a static linter. It...

> Both `MapDataPipe` and `IterDataPipe` will return a list of tensors. What should I do if I want to return a single tensor instead of a list of multiple small...

> The other option is to integrate the DALI data loader as a data pipe in torch.data Thanks @msaroufim, I had the same feeling about making it as a separate...

This is fixed by https://github.com/pytorch/text/pull/1942 and https://github.com/pytorch/data/pull/810

And, there is a use case that might affect the performance on S3FileLoader. If I do `tarfile.open(fileobj=s3_stream_returned_from_s3fileloader, mode=m, bufsize=20000000240)`, the speed with mode `r:` is way faster than the mode...