paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] Support AWS Glue & Lake Formation for Paimon metadata/metastore

Open gmdfalk opened this issue 11 months ago • 3 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Motivation

Apache Iceberg supports AWS Glue & AWS Lake Formation to manage tables & permissions centrally and more easily than on the S3 level with IAM policies.

Can we get AWS Glue & Lake Formation support for Paimon metadata? This would make it easier to use Paimon in AWS ecosystems, including integrating with Athena for querying Paimon tables in the future.

Solution

We already support AWS Glue in Apache Paimon for the Iceberg-compatibility layer. Currently, we need to build and configure an external client: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/tree/branch-3.4.0

Maybe we can port some of the AWS/Iceberg code to Paimon, including GlueCatalog and LakeFormationAwsClientFactory.

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

gmdfalk avatar Jan 31 '25 09:01 gmdfalk

+1 I think it is good to support GlueCatalog in Paimon too.

JingsongLi avatar Feb 10 '25 06:02 JingsongLi

Is it currently not possible to use the iceberg compatibility with AWS Glue Catalogs?

tsturzl avatar Mar 10 '25 17:03 tsturzl

I would like to work on this. I will share the draft PR for review.

Shekharrajak avatar Apr 10 '25 17:04 Shekharrajak

PIP-28 mentions using the REST catalog to integrate with AWS Glue. If Glue supports Lake Formation vended credentials through the REST API this would be a simple and effective way to add Lake Formation support on Glue catalogs to Paimon.

Iceberg REST spec has /credentials API to provide this functionality.

I've heard from AWS that the Glue Catalog Iceberg REST APIs do support vending Lake Formation credentials though I don't see it on AWS Glue REST APIs for Apache Iceberg specifications. More investigation is required.

nickdelnano avatar Apr 28 '25 02:04 nickdelnano

I have been investigating this and am considering the work as two topics. I want to link these two issues together with an explanation.

  1. REST for Iceberg Compatibility https://github.com/apache/paimon/issues/4394 I am an AWS user so I will explain the benefits on that perspective. However there are non-AWS specific benefits mentioned in the issue.
  • Remove the dependency on aws-glue-data-catalog-client-for-hive-metastore so that Paimon users do not need to build that project in order to integrate with AWS.
  • Support more database and table parameters that the glue hive client does not support but the REST API does.
  • Resolve some performance issues on large datasets that we have observed in production. Currently the glue hive client does some operations like table statistics updates on each commit that are not necessary for table formats like Iceberg. We are adding an internal patch to remove this however Glue REST would avoid this and make Paimon's Iceberg Compatibility more resilient in production in AWS.
  • AWS Lake Formation support for Iceberg Compatibility.

  1. Glue & Lake Formation support for Paimon - this issue

It would be best to provide this using REST as it will not require much custom code because REST catalog client is already added to Paimon.

As I mention in my previous comment, Glue provides a REST catalog server. However it requires that the table type in the Glue catalog is set as Iceberg else the APIs return an error. I think it may be possible to work around this in some creative ways that I hope to explore when I have some time. There is a lot of benefit of reusing the same REST interfaces so it is worth exploring this fully before proceeding with an alternative.

nickdelnano avatar May 15 '25 23:05 nickdelnano

+1 I also use Paimon on AWS with data in S3 and metadata in a Hive metastore on EMR/RDS, and plan to move metadata to AWS Glue. Native Glue & Lake Formation support in Paimon (beyond the Hive Glue client) would simplify our setup and permissions management a lot.

rahulcode751 avatar Nov 28 '25 17:11 rahulcode751