data
data copied to clipboard
Add support for `kwargs` in S3 DataPipes
🚀 The feature
S3FileLister
and S3FileLoader
currently doesn't support keyword argument beyond request_timeout_ms
, region
, buffer_size
, and multi_part_download
. Th
One example is here, where a user would like to read a specify version of the bucket. I imagine there are similar parameters that users may want to pass through. Some of which may be passed to the construction of the S3Handler
and some others may be used in handler.s3_read()
.
Motivation, pitch
This will allow users to specify additional parameters to interact with S3 according to their needs.
Alternatives
Add support in fsspec
DataPipes instead and asks users to use those if necessary.
Additional context
No response
cc: @ejguan
I would say it's low pri until the perf issue has been solved for S3. And, in terms fsspec
, it seems it's viable https://s3fs.readthedocs.io/en/latest/#bucket-version-awareness
And, with your PR https://github.com/pytorch/data/pull/804, it should become doable for users.
I would say it's low pri until the perf issue has been solved for S3. And, in terms
fsspec
, it seems it's viable https://s3fs.readthedocs.io/en/latest/#bucket-version-awareness
Curious as to what the s3 perf issue is?