iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Why not use the profile name when initialising the S3FileSystem class?

Open wudihero2 opened this issue 1 year ago • 6 comments

Question

Hi, In #922 standardize some AWS credential names, but I am confused why not use below code to use aws profile name at pyiceberg.io.fsspec.py?

profile_session = AioSession(profile="xxx")
fs = s3fs.S3FileSystem(session=profile_session)

It supports the use of AWS profile names like `glue.profile-name' as in the following code at pyiceberg.catalog.glue.py

profile_name=PropertyUtil.get_first_property_value(properties, GLUE_PROFILE_NAME, DEPRECATED_PROFILE_NAME),

Maybe it would be a better improvement if we could use profile name or is there some concern I haven't considered?

wudihero2 avatar Sep 25 '24 17:09 wudihero2

I think this is a feature gap on the S3 FileIO. It makes sense to support profile_name. We would need to support both fsspec and pyarrow

Is this something you would like to contribute?

kevinjqliu avatar Sep 25 '24 18:09 kevinjqliu

Hello, I am interested in this, do I need to tag the person who will assign this task to me?

wudihero2 avatar Sep 26 '24 04:09 wudihero2

@wudihero2 assigned to you :)

kevinjqliu avatar Sep 26 '24 15:09 kevinjqliu

Hi folks, I was under the impression that this was something that would need to be addressed in PyArrow S3FileSystem. Please see @HonahX 's earlier comment:

https://github.com/apache/iceberg-python/pull/922#discussion_r1677395543

Previous issue on same topic: https://github.com/apache/iceberg-python/issues/570 PyArrow S3FileSystem: https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html

On that note, I'm in favor of keeping this issue open since this is a frequent question from the community, until we are able to find a solution by perhaps working with the Arrow community.

sungwy avatar Sep 26 '24 15:09 sungwy

https://github.com/apache/iceberg-python/issues/1104#issuecomment-2377397379 this thread is somewhat related

kevinjqliu avatar Sep 26 '24 16:09 kevinjqliu

Hi all, I checked the code of pyarrow and found that the profile_name parameter is not currently supported. The s3.* related parameters are indeed not suitable for supporting profile_name. It would be great if we could work with the Arrow community to find a solution! This package is great and can help me use iceberg for my current job, thank you !!

wudihero2 avatar Sep 27 '24 16:09 wudihero2