hadoop
hadoop copied to clipboard
HADOOP-18708: Support S3 Client Side Encryption(CSE) With AWS SDK V2
Description of PR
This commit adds support for S3 client side encryption (CSE). CSE can configured in two modes CSE-KMS where keys are provided by AWS KMS and CSE-CUSTOM where custom keys are provided by implementing custom keyring
CSE is implemented using S3EncryptionClient (V3 client) and additional configurations (mentioned below) were added to make it compatible with the older encryption client V1 and V2 which is turned OFF by default.
Inorder to have compatibility with V1 client the following operations are done.
- V1 client pads extra bytes in multiple of 16 i.e if the file size is 12 bytes, 4bytes are padded to make it multiple of 16. Inorder to get the unencrypted file size of such S3 object ranged S3 GET call is made
- V1/V2 client supports storing encrypted metadata in instruction file (.instruction) and hence those files are skipped during listing.
- Unlike V1/V2 client V3 client does not support reading unencrypted object, Additional s3 client (base client) is introduced to read mix of encrypted and unencrypted s3 objects.
Default Behavior
The configurations to make it backward compatible is turned OFF by default considering the performance implications. The default behavior is as follows
- The unencrypted file size is computed by simply subtracting 16 bytes from the file size.
- When there is a mix of unencrypted and encrypted s3 objects, The client fails.
This PR is based on the initial work done by @ahmarsuhail as part of https://github.com/apache/hadoop/pull/6164
How was this patch tested?
- Tested in us-east-1 with
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify
. - Added integration test for CSE-KMS and CSE-CUSTOM