Vitaly Fedyunin

Results 85 comments of Vitaly Fedyunin

@parmeet Does it specifically manifests with IMDB or there are other DataSets involved? First thing that comes into my mind is to play with timeout. I can quickly draft Adapter...

Hi! Can you also share the code you are using for benchmarks?

Sorry, which colab you are talking about?

You are probably looking for something like a per-column collate? ``` dp = dp.collate( { 0: fn_1, 1: fn_2 }) ``` In this case fn_1 will get `["a","b"]` as input...

Need to be very careful. As for example here https://github.com/pytorch/data/blob/12cfaf8899b1337981cd4edf9deef127f925f1bd/torchdata/dataloader2/reading_service.py#L17 is should be IterDataPipe as any MapDataPipe supposed to be wrapped to sampler before passing to DataLoader.

Surely doing it for flatmap now, plus will attempt to convert several inner DataPipes to this approach.

UPD: Temporary make a copy of `FlatMapper` UPD: Rewrote JsonParser as example.

> I think it's better to move this class to PyTorch Core for `Mapper` and `Filter`. To be honest I prefer to keep as much as possible inside this repo....

Direct use of abstract leads to: ``` torchdata/datapipes/iter/util/jsonparser.py:40:5: error: Signature of "_map" incompatible with supertype "MapTemplateIterDataPipe" [override] def _map(self, stream: IO): ```

@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D40146676).