data icon indicating copy to clipboard operation
data copied to clipboard

Add support for `kwargs` in S3 DataPipes

Open NivekT opened this issue 2 years ago • 2 comments

🚀 The feature

S3FileLister and S3FileLoader currently doesn't support keyword argument beyond request_timeout_ms, region, buffer_size, and multi_part_download. Th

One example is here, where a user would like to read a specify version of the bucket. I imagine there are similar parameters that users may want to pass through. Some of which may be passed to the construction of the S3Handler and some others may be used in handler.s3_read().

Motivation, pitch

This will allow users to specify additional parameters to interact with S3 according to their needs.

Alternatives

Add support in fsspec DataPipes instead and asks users to use those if necessary.

Additional context

No response

cc: @ejguan

NivekT avatar Oct 03 '22 18:10 NivekT

I would say it's low pri until the perf issue has been solved for S3. And, in terms fsspec, it seems it's viable https://s3fs.readthedocs.io/en/latest/#bucket-version-awareness

ejguan avatar Oct 05 '22 15:10 ejguan

And, with your PR https://github.com/pytorch/data/pull/804, it should become doable for users.

ejguan avatar Oct 05 '22 15:10 ejguan

I would say it's low pri until the perf issue has been solved for S3. And, in terms fsspec, it seems it's viable https://s3fs.readthedocs.io/en/latest/#bucket-version-awareness

Curious as to what the s3 perf issue is?

kiukchung avatar Nov 10 '22 01:11 kiukchung