hadoop
hadoop copied to clipboard
HADOOP-18708: Support S3 Client Side Encryption(CSE) With AWS SDK V2
Description of PR
This commit adds support for S3 client side encryption (CSE). CSE can configured in two modes CSE-KMS where keys are provided by AWS KMS and CSE-CUSTOM where custom keys are provided by implementing custom keyring
CSE is implemented using S3EncryptionClient (V3 client) and additional configurations (mentioned below) were added to make it compatible with the older encryption client V1 and V2 which is turned OFF by default.
Inorder to have compatibility with V1 client the following operations are done.
- V1 client pads extra bytes in multiple of 16 i.e if the file size is 12 bytes, 4bytes are padded to make it multiple of 16. Inorder to get the unencrypted file size of such S3 object ranged S3 GET call is made
- V1/V2 client supports storing encrypted metadata in instruction file (.instruction) and hence those files are skipped during listing.
- Unlike V1/V2 client V3 client does not support reading unencrypted object, Additional s3 client (base client) is introduced to read mix of encrypted and unencrypted s3 objects.
Default Behavior
The configurations to make it backward compatible is turned OFF by default considering the performance implications. The default behavior is as follows
- The unencrypted file size is computed by simply subtracting 16 bytes from the file size.
- When there is a mix of unencrypted and encrypted s3 objects, The client fails.
This PR is based on the initial work done by @ahmarsuhail as part of https://github.com/apache/hadoop/pull/6164
How was this patch tested?
- Tested in us-east-1 with
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify. - Added integration test for CSE-KMS and CSE-CUSTOM
@steveloughran @ahmarsuhail Could you please review the changes ?
FYI, Jira on create performance. I'm worrying about how to remove/delay client configuration, so adding a new client makes things way worse: HADOOP-19205
FYI, Jira on create performance. I'm worrying about how to remove/delay client configuration, so adding a new client makes things way worse: HADOOP-19205
by default we are not initializing new s3 client. It is done under a config to provide backward compatibility only when required.
It's still happening in FileSystem.initialize().
If I get time this week I'll do my PoC of lazy creation of transfer manager and async client, as that's doubled startup time already. all of that will be moved behind S3Store with only the copy() methods exposed.
Ultimately I want the S3Client itself hidden behind that API, so here
- new unencryptedS3 operations
getLengthOfEncryptedObject(path), ..., would trigger demand creation of the method - accessor to this passed down.
Actually, createFileStatusFromListingEntry() could be part of S3AStore took, somehow. all the config information for the creation, especially cSE flags, would be in the store config, so listing wouldn't need it
#6892 is the draft pr with lazy creation; a new s3 client would go in there, but ideally we'd actually push the listing/transformation code into S3AStore so the extra client would never be visible outside it.
#6892 is the draft pr with lazy creation; a new s3 client would go in there, but ideally we'd actually push the listing/transformation code into S3AStore so the extra client would never be visible outside it.
@steveloughran - I really like the idea of lazy initializing async client and new interface for creating s3 client. I will wait for your changes to get in and refactor my code based on it. By this way i could lazy initialize unencryted s3 client as well when the config is enabled.
exactly. when its not needed: no penalty.
@steveloughran @ahmarsuhail I have rebased the changes on top of https://github.com/apache/hadoop/pull/6892 . Please review the changes
Thanks
@steveloughran I really appreciate your time and effort to review this changes. Indeed it was a detailed review. I have tried to address all your comments and raised a new revision (I hope i have covered all your comments).
Please take a look
Thanks
@shameersss1 unless you have no choice, please try not to rebase and force push prs once reviews have started. makes it a lot harder to see what has changed
@shameersss1 unless you have no choice, please try not to rebase and force push prs once reviews have started. makes it a lot harder to see what has changed
Yeah, I had to rebase for merge conflicts in hadoop-common module. This commit (https://github.com/apache/hadoop/pull/6884/commits/838dc24c88cb1d20503c19f99c2886bcbb2c1e25) contains the addressing of review comments.
@steveloughran - Gentle reminder for the review Thanks
@shameersss1 what do you think of @raphaelazzolini 's comment
@shameersss1 what do you think of @raphaelazzolini 's comment
I understand @raphaelazzolini 's concern of having many isCSEEnabled conditions especially in the listing methods. If i understand it correctly the ask is to introduce to handler interface pattern as follows
`interface ListingHandler { Listing getListing(Path path, boolean includeSelf); }
class CSEEnabledListingHandler implements ListingHandler { @Override public Listing getListing(Path path, boolean includeSelf) { return includeSelf ? new Listing.AcceptAllButS3nDirsAndCSEInstructionFile() : new Listing.AcceptAllButSelfAndS3nDirsAndCSEInstructionFile(path); } }
class CSEDisabledListingHandler implements ListingHandler { @Override public Listing getListing(Path path, boolean includeSelf) { return includeSelf ? new Listing.AcceptAllButS3nDirs() : new Listing.AcceptAllButSelfAndS3nDirs(path); } }
@Retries.RetryTranslated public RemoteIterator<S3ALocatedFileStatus> listFilesAndDirectoryMarkers( final Path path, final S3AFileStatus status, final boolean includeSelf) throws IOException { auditSpan.activate(); ListingHandler handler = isCSEEnabled ? new CSEEnabledListingHandler() : new CSEDisabledListingHandler(); return innerListFiles(path, true, handler.getListing(path, includeSelf), status); } `
I agree handler interface strategy will be more elegant. I will make the required code changes. I see there are conflicting changes which requires rebase and force push
@steveloughran - I have raised a new revision. Please do review the same. Thanks
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 17m 3s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 1s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +0 :ok: | xmllint | 0m 0s | xmllint was not available. | |
| +0 :ok: | markdownlint | 0m 0s | markdownlint was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| +1 :green_heart: | test4tests | 0m 0s | The patch appears to include 10 new or modified test files. | |
| _ trunk Compile Tests _ | ||||
| +0 :ok: | mvndep | 14m 21s | Maven dependency ordering for branch | |
| +1 :green_heart: | mvninstall | 32m 35s | trunk passed | |
| +1 :green_heart: | compile | 17m 32s | trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| +1 :green_heart: | compile | 16m 1s | trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| +1 :green_heart: | checkstyle | 4m 30s | trunk passed | |
| +1 :green_heart: | mvnsite | 3m 23s | trunk passed | |
| +1 :green_heart: | javadoc | 2m 35s | trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| +1 :green_heart: | javadoc | 2m 22s | trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| +0 :ok: | spotbugs | 0m 44s | branch/hadoop-project no spotbugs output file (spotbugsXml.xml) | |
| +1 :green_heart: | shadedclient | 34m 46s | branch has no errors when building and testing our client artifacts. | |
| -0 :warning: | patch | 35m 12s | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | |
| _ Patch Compile Tests _ | ||||
| +0 :ok: | mvndep | 1m 7s | Maven dependency ordering for patch | |
| +1 :green_heart: | mvninstall | 1m 45s | the patch passed | |
| +1 :green_heart: | compile | 16m 48s | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| +1 :green_heart: | javac | 16m 48s | the patch passed | |
| +1 :green_heart: | compile | 16m 21s | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| +1 :green_heart: | javac | 16m 21s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| -0 :warning: | checkstyle | 4m 21s | /results-checkstyle-root.txt | root: The patch generated 2 new + 26 unchanged - 0 fixed = 28 total (was 26) |
| +1 :green_heart: | mvnsite | 3m 23s | the patch passed | |
| +1 :green_heart: | javadoc | 2m 31s | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| +1 :green_heart: | javadoc | 2m 21s | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| +0 :ok: | spotbugs | 0m 38s | hadoop-project has no data from spotbugs | |
| +1 :green_heart: | shadedclient | 35m 4s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 0m 38s | hadoop-project in the patch passed. | |
| +1 :green_heart: | unit | 19m 37s | hadoop-common in the patch passed. | |
| +1 :green_heart: | unit | 2m 58s | hadoop-aws in the patch passed. | |
| +1 :green_heart: | asflicense | 1m 6s | The patch does not generate ASF License warnings. | |
| 267m 6s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6884/8/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/6884 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint |
| uname | Linux c24e8baf0f29 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 069ec479c1aaa76f99acc99eeca572990d1da3ba |
| Default Java | Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6884/8/testReport/ |
| Max. process+thread count | 2742 (vs. ulimit of 5500) |
| modules | C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6884/8/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
@steveloughran and @ahmarsuhail - I have addressed your comments. Could you please re-review ?
Had to force push
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
@steveloughran @ahmarsuhail request for rereview Thanks
@ahmarsuhail Thanks a lot for your review. I have addressed the additional/minor comments. @steveloughran - Can you also please take a look ?
@steveloughran - Are we good to take this forward ? Please let me know if there is anything else to address
@steveloughran - Gentle reminder for review
@steveloughran - Gentle reminder for review
@shameersss1 you have to fix the conficts first
i am distracted right now by SDK regressions; not looking at anything else
@steveloughran - I have resolved the merge conflict and force pushed.
@steveloughran - Due to changes spanning accross 50 files, This PR is prone to merge conflicts very often if left opened for a while. Could you please review this ?
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 1m 10s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 1s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +0 :ok: | xmllint | 0m 0s | xmllint was not available. | |
| +0 :ok: | markdownlint | 0m 0s | markdownlint was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| +1 :green_heart: | test4tests | 0m 0s | The patch appears to include 17 new or modified test files. | |
| _ trunk Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 20s | Maven dependency ordering for branch | |
| -1 :x: | mvninstall | 0m 23s | /branch-mvninstall-root.txt | root in trunk failed. |
| -1 :x: | compile | 0m 23s | /branch-compile-root-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt | root in trunk failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04. |
| -1 :x: | compile | 0m 40s | /branch-compile-root-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt | root in trunk failed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05. |
| -0 :warning: | checkstyle | 0m 24s | /buildtool-branch-checkstyle-root.txt | The patch fails to run checkstyle in root |
| -1 :x: | mvnsite | 3m 16s | /branch-mvnsite-hadoop-common-project_hadoop-common.txt | hadoop-common in trunk failed. |
| +1 :green_heart: | javadoc | 1m 53s | trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| +1 :green_heart: | javadoc | 1m 16s | trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| -1 :x: | spotbugs | 0m 13s | /branch-spotbugs-hadoop-common-project_hadoop-common.txt | hadoop-common in trunk failed. |
| +0 :ok: | spotbugs | 0m 24s | branch/hadoop-project no spotbugs output file (spotbugsXml.xml) | |
| +1 :green_heart: | shadedclient | 38m 44s | branch has no errors when building and testing our client artifacts. | |
| _ Patch Compile Tests _ | ||||
| +0 :ok: | mvndep | 6m 25s | Maven dependency ordering for patch | |
| -1 :x: | mvninstall | 0m 14s | /patch-mvninstall-hadoop-common-project_hadoop-common.txt | hadoop-common in the patch failed. |
| -1 :x: | mvninstall | 0m 20s | /patch-mvninstall-hadoop-tools_hadoop-aws.txt | hadoop-aws in the patch failed. |
| +1 :green_heart: | compile | 17m 38s | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| -1 :x: | javac | 17m 38s | /results-compile-javac-root-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt | root-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 generated 35 new + 0 unchanged - 0 fixed = 35 total (was 0) |
| +1 :green_heart: | compile | 16m 29s | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| -1 :x: | javac | 16m 29s | /results-compile-javac-root-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt | root-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05 with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 generated 44 new + 0 unchanged - 0 fixed = 44 total (was 0) |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| -0 :warning: | checkstyle | 4m 19s | /results-checkstyle-root.txt | root: The patch generated 34 new + 0 unchanged - 0 fixed = 34 total (was 0) |
| -1 :x: | mvnsite | 0m 43s | /patch-mvnsite-hadoop-common-project_hadoop-common.txt | hadoop-common in the patch failed. |
| -1 :x: | mvnsite | 0m 47s | /patch-mvnsite-hadoop-tools_hadoop-aws.txt | hadoop-aws in the patch failed. |
| +1 :green_heart: | javadoc | 2m 44s | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | |
| +1 :green_heart: | javadoc | 2m 23s | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | |
| +0 :ok: | spotbugs | 0m 38s | hadoop-project has no data from spotbugs | |
| -1 :x: | spotbugs | 0m 34s | /patch-spotbugs-hadoop-common-project_hadoop-common.txt | hadoop-common in the patch failed. |
| -1 :x: | spotbugs | 0m 43s | /patch-spotbugs-hadoop-tools_hadoop-aws.txt | hadoop-aws in the patch failed. |
| +1 :green_heart: | shadedclient | 34m 43s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 0m 39s | hadoop-project in the patch passed. | |
| -1 :x: | unit | 0m 37s | /patch-unit-hadoop-common-project_hadoop-common.txt | hadoop-common in the patch failed. |
| -1 :x: | unit | 0m 37s | /patch-unit-hadoop-tools_hadoop-aws.txt | hadoop-aws in the patch failed. |
| +1 :green_heart: | asflicense | 1m 6s | The patch does not generate ASF License warnings. | |
| 150m 26s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6884/17/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/6884 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint |
| uname | Linux ade6b4e278fa 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 37eead58f3d4350a560b3ef01c1e629ecd301bb7 |
| Default Java | Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6884/17/testReport/ |
| Max. process+thread count | 551 (vs. ulimit of 5500) |
| modules | C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6884/17/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
I will do this and other pending stuff once we've got 3.4.1 out the door -that is the only thing I'm focused on right now. The best thing you could do right now is help qualify that in your environments.
@steveloughran Sure i would gladly be part of hadoop-3.4.1 release. Is there any specific thing you want me do? Should i run all the S3A integration tests for the 3.4.1 release candidate branch ? If you could you please point me to the same and the place where i should update the results.
@shameersss1 I'll send you the steps on how to do this
@ahmarsuhail - Thanks Ahmar for the steps, I have signed off on ARM64 EC2 machine.