Kevin Tse comments

Results 85 comments of


                                            Kevin Tse

[DataPipe] Add RandomSplitter (without buffer)

@nivekt has imported this pull request. If you are a Facebook employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D38675266).

[DataPipe] Add RandomSplitter (without buffer)

@nivekt has imported this pull request. If you are a Facebook employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D38675266).

Recommended way to shuffle intra and inter archives?

> What is a good canonical way to shuffle intra and inter archives? I think the best way is to use [`in_batch_shuffle`](https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/transform/bucketbatcher.py#L19-L47). Though we need to add further randomness control...

[Linter] Ability to disable some lints

I share the same concern as @ejguan regarding option 1 since we need the linter to work during runtime. I personally prefer Option 2, specially the Global based rather than...

Chainer/Concater from single datapipe?

Edit: I think the new operation [`flatten` in this open PR](https://github.com/pytorch/data/blob/da5e6493c8cc2f040f6a54487f228052dabb131e/torchdata/datapipes/iter/transform/callable.py#L289) should be able to handle a IterDataPipe of iterables, depending on how we end up implementing that.

[DataPipe] shard expander

> @VitalyFedyunin Mind taking a look at this PR? We'll have a look tomorrow or Wednesday. Thanks!

Daemonize reading service's processes

This can be closed as #837 has landed

Does torchdata already work with GCP and Azure blob storage

I am going to take a quick look into `fsspec` vs `s3` performance in my benchmark

Does torchdata already work with GCP and Azure blob storage

My benchmark shows that using `FSSpecFileOpener` is faster and it also provides the ability to stream (rather than downloading a whole archive into memory before reading).

Does torchdata already work with GCP and Azure blob storage

Since #812 and #836 have landed, I believe users should be able to use GCP and Azure Blob storage. Please feel free to re-open this issue or open a new...