trino icon indicating copy to clipboard operation
trino copied to clipboard

Support Parquet Modular Encryption for Trino

Open my7ym opened this issue 1 year ago • 16 comments

Description

This large PR is to support Parquet Modular Encryption for Trino. The PR core logic is migrated from the one in Presto with several changes to bridge the big gaps between APIs in these two repos.

To avoid no trino-hdfs dep in other modules, we pass the trino decryption configs in parquet configs and then translate to parquet flags. Not sure if there is a better way to do this.

For Trino issue: https://github.com/trinodb/trino/issues/9383

Will fix the test and linting failures after the implementation is stable.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

my7ym avatar Dec 08 '23 21:12 my7ym

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Dec 08 '23 21:12 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Done

my7ym avatar Dec 08 '23 21:12 my7ym

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Dec 15 '23 00:12 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Dec 15 '23 01:12 cla-bot[bot]

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jan 10 '24 17:01 github-actions[bot]

👋 @my7ym .. please let us know if you need any help and ping us when this is ready for review again.

mosabua avatar Jan 11 '24 19:01 mosabua

@amoghmargoor @findepi @electrum can you potentially help here and work on the removal of any Hadoop deps as discussed in the contributor call https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-1-feb-2024

In fact looking at the message in the PR it looks like these removals are done. Maybe all we need is for @my7ym to rebase and then coordinate reviews and changes.

Also @my7ym .. I am not sure if you are on trino slack .. if so please let us know and chime in at https://trinodb.slack.com/archives/CP1MUNEUX/p1706832386414769 or sync up in general with us to continue this work together

mosabua avatar Feb 02 '24 00:02 mosabua

@my7ym Great work on reimplementing the Parquet code to not require Hadoop. I'll review this ASAP.

electrum avatar Feb 02 '24 03:02 electrum

Also fyi @raunaqmorarka

mosabua avatar Feb 02 '24 03:02 mosabua

It looks like this code still depends on Hadoop. The config also contains a class name that is loaded at runtime, which is something we avoid in Trino. I think we'll want to have a list of supported KMS implementations that work out-of-the-box with appropriate config.

It would be helpful to have a product test that shows this working end-to-end.

electrum avatar Feb 12 '24 19:02 electrum

It looks like this code still depends on Hadoop. The config also contains a class name that is loaded at runtime, which is something we avoid in Trino. May I quickly know what class it is?

I think we'll want to have a list of supported KMS implementations that work out-of-the-box with appropriate config.

Good call. Will do.

It would be helpful to have a product test that shows this working end-to-end. May I know where is the best place to add it? Any existing example?

Thanks!

my7ym avatar Feb 17 '24 00:02 my7ym

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Mar 11 '24 17:03 github-actions[bot]

@my7ym let us know if you need any help with rebasing and next steps. cc @electrum

mosabua avatar Mar 11 '24 21:03 mosabua

@mosabua A bit tied up in the past weeks. Will work on this in next week. Will let you know if I have more questions. Thanks for checking in.

my7ym avatar Mar 16 '24 00:03 my7ym

hi @my7ym, I'm working on a project that needs this feature, and I've been following this PR with interest. Is the branch featured in this PR in a workable state (with the understanding that there are Hadoop deps that currently prevent merging into Trino)?

I'm eager to assist if there's any way I can help to move this forward. I'm relatively new to Trino but very enthusiastic about contributing to the community :)

ancientmodern avatar Mar 29 '24 06:03 ancientmodern

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar May 10 '24 17:05 github-actions[bot]

@electrum @amoghmargoor how do we want to proceed here?

mosabua avatar May 29 '24 20:05 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jun 21 '24 17:06 github-actions[bot]

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

github-actions[bot] avatar Jul 15 '24 17:07 github-actions[bot]

Can you provide guidance on how to proceed here @electrum ?

mosabua avatar Jul 15 '24 17:07 mosabua

@my7ym could you proceed to remove the hadoop dependencies as requested by @electrum and rebase the PR?

mosabua avatar Jul 16 '24 15:07 mosabua