iceberg-python
iceberg-python copied to clipboard
Support Snowflake-Managed Iceberg Tables via SnowflakeCatalog
Closes #685.
Rationale for this change
Reopens PR #687 that adds a Snowflake Catalog that was closed. I addressed comments and applied some additional changes based on errors found when used with the Bodo data processing library.
One way Snowflake supports Iceberg is via managed tables, where Snowflake has both read and write access to these tables. They are basically regular Snowflake tables with an Iceberg backend. Outside of Snowflake, these tables are read-only. To work with them, we wrap some SQL calls in a Catalog API.
I skipped some of the less-commonly used APIs that can be filled in later.
Are these changes tested?
Tested manually by itself and with the Bodo library on both AWS and Azure. Some of the Azure tests don't current work because Snowflake uses path prefixes like wasb://, wasbs://, etc. Waiting for the other PR for support for PyArrowFileIO w/ Azure.
Also copied the mock tests from the original PR.
Are there any user-facing changes?
Users can read and query Snowflake-managed Iceberg tables, with minimal write operations.
@srilman From what I understand from Snowflake, it is also transitioning to the Rest Catalog protocol. I much rather use that since it is properly tested. As you mentioned, there are some unsolved issues with the Snowflake catalog.
Hi @Fokko,
Are you aware of any public announcement from Snowflake signalling they are transitioning to REST catalog for Snowflake-managed tables?
AFAIK this is not in their roadmap.
@iamontheinet @jdanielmyers What's your take on this?
At my org, we'd love to use Snowflake-managed Iceberg tables but the lack of support for them in pyiceberg has been blocking us for months.
Thanks for the help!
@monti-python I think the best way to unblock yourself for Snowflake is to do like this article suggests, and sync the Snowflake-managed Iceberg table with an "Open Catalog" (aka snowflake-managed polaris) and then query using pyiceberg/Spark/etc, since the open catalog/Polaris implements REST catalog spec.
Hi @corleyma,
Thanks for the tip!
Unfortunately, that approach would require configuring the catalog as "external", which only supports reads (not writes) from third-party tools such as PyIceberg, Spark, etc. That limitation makes it an unworkable solution in this case.
Until either:
- PyIceberg supports the Snowflake-managed catalog, or
- Snowflake natively supports the REST protocol,
the only way to enable bi-directional read/write access between Python and Snowflake is to create an "internal" catalog in Open Catalog. However, doing so would mean replicating our entire Snowflake RBAC framework in Open Catalog, a tool that doesn’t offer equivalent semantics or authentication/authorization features.
For these reasons, a proper solution here is badly needed.
Reference: https://docs.snowflake.com/en/user-guide/opencatalog/overview#catalog-types
CC @iamontheinet @jdanielmyers @Fokko