iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Accessing S3 Express one zone bucket from pyiceberg

Open munip opened this issue 1 year ago • 4 comments

Question

I have been able to access a S3 bucket with pyIceberg using SqlCatalog successfully with catalog = SqlCatalog( "default", **{ "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db", "warehouse": "s3://myicebergbkt/test", "s3.access-key-id": "myid", "s3.secret-access-key": "mykey", "s3.session-token":"my-token" "s3.region": "us-east-1" }, ) But, when I try accessing the same with S3 express one bucket, I am stuck on the syntax. Tried all options with no luck: catalog = SqlCatalog( "default", **{ "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db", "warehouse": "s3://us-east-1:730335207565:bucket/pyicebkt--use1-az4--x-s3/test", # I have also tried 730335207565:bucket/pyicebkt--use1-az4--x-s3 and just pyicebkt--use1-az4--x-s3 with no lcuk "s3.access-key-id": "myid", "s3.secret-access-key": "mykey", "s3.session-token":"my-token" "s3.region": "us-east-1" }, )

I get the error : " Expected an S3 object path of the form 'bucket/key...', got a URI: " Is S3 express one zone supported? If so, what is the syntax for warehouse variable?

munip avatar Jul 14 '24 14:07 munip

I think the error might be coming from the underlying pyarrow.fs.S3FileSystem class which is used to interact with s3 https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html Not sure if this currently supports S3 Express One Zone right now.

According to this thread, PyArrow does not currently support it https://github.com/lancedb/lancedb/issues/1206

kevinjqliu avatar Jul 14 '24 16:07 kevinjqliu

Thanks kevinjqliu. From the other thread it doesn't look like pyarrow supports S3 express one. Does anyone know timelines for Express One Zone support?

muniatl avatar Jul 15 '24 05:07 muniatl

@muniatl The best place to reach out would be the Arrow mailing list: https://lists.apache.org/[email protected]

Fokko avatar Jul 15 '24 09:07 Fokko

Arrow mailing list would be a good place to start.

PyIceberg depends on pyarrow to support s3 express one zone. I've found https://github.com/apache/arrow-rs-object-store/issues/106 which adds support for the arrow rust library. It'll be great to open an issue with pyarrow to track support for s3 express one zone.

kevinjqliu avatar Jul 15 '24 17:07 kevinjqliu

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jan 12 '25 00:01 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Jan 27 '25 00:01 github-actions[bot]