Erjia Guan

Results 170 comments of Erjia Guan

It depends on how you `open` your file, rather than `StreamReader`. If you use `FileOpener` (functional API as `open_files`), you can specify the encoding to `b` to open file in...

cc: @VitalyFedyunin @NivekT since I am not sure who has the permission to edit the colab file Edit: We might even want to remove the colab file since we have...

The one in here: https://github.com/pytorch/data#colab

Let's not label it as the good first issue since the fix might need some work in the internal code base

> nit: This PR is fine (with nit comments) but there is something unsatisfying about how `input_col` and `output_col` work in the case where I want to select an element...

> Yes, it may require some adjustment to how `input_col` and `MapTemplate` works to handle non-list/tuple inputs. I think it's doable after this PR. We can let `JsonParser` be a...

It's a reasonable feature for `MapDataPipe`. Would it be possible to extend this cache interface to support in-memory cache for [`IterDataPipe`](https://github.com/pytorch/data/blob/a7745b9865ed590c252072c51026400ae64a656f/torchdata/datapipes/iter/util/cacheholder.py#L75-L95), which follows FIFO manner. BTW, I think `__contains__` is...

@Spartee Thank you for putting up a prototype! Here are my thoughts. Aside from `optimize retrieval`, we might be able to provide multiple-layer `cache` to reduce cach miss. For `redis`...

> How would the processes be initilaized? user passes them in? Can you point me to any examples that use a single datapipe with multiple processes? Here are some references....