Cannot upload large files using Multipart to an S3 bucket with both Object Locking and Encryption Enabled.
Describe the bug
We are attempting to upload large files to an object locked bucket that has encryption enabled. We use multipart uploads and encryption with an object locked bucket. We set the following configuration on the transferManager
setAlwaysCalculateMultipartMd5(true)
However we receive the following stacktrace when attempting to upload:
com.amazonaws.services.s3.model.AmazonS3Exception: Content-MD5 OR x-amz-checksum- HTTP header is required for Put Part requests with Object Lock parameters (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID:; S3 Extended Request ID:; Proxy: null) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3887) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3872) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:323) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:226) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:147) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:115) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:45) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.lang.Thread.run(Thread.java:829)
Expected Behavior
Uploads should respect the following configuration when uploading parts in parallel when encryption is enabled.
transferManager.getConfiguration().setAlwaysCalculateMultipartMd5(true)
Uploads using multi part should work on an S3 bucket with both encryption and object locking.
Current Behavior
We are required to increase our multipart thresh hold to prevent multipart upload from being executed, which causes a steep performance penalty on large uploads
Reproduction Steps
1 - Create an Object locked S3 bucket with encryption enabled. 2 - Create a file with 50MB of content at /my-mega-file.txt. 3 - The following code should trigger the issue (I don't think I can share our actual production code that's hitting this, so this is a rough mockup of it)
public static void main(String[] args) throws IOException {
String bucketName = "my-bucket-for-aws";
String keyName = "my-file";
File file = new File("/my-mega-file.txt");
String kmsKeyId = "some-key-id";
AWSKMS kmsClient = AWSKMSClientBuilder.standard()
.withRegion(Regions.DEFAULT_REGION)
.build();
AmazonS3EncryptionV2 s3Encryption = AmazonS3EncryptionClientV2Builder.standard()
.withRegion(Regions.DEFAULT_REGION)
.withKmsClient(kmsClient)
.withCryptoConfiguration(new CryptoConfigurationV2().withCryptoMode((CryptoMode.AuthenticatedEncryption)))
.withEncryptionMaterialsProvider(new KMSEncryptionMaterialsProvider(kmsKeyId))
.build();
TransferManager tm = TransferManagerBuilder.standard()
.withS3Client(s3Encryption)
.withAlwaysCalculateMultipartMd5(true)
.build();
Upload upload = tm.upload(bucketName, keyName, file);
try {
upload.waitForCompletion();
} catch (AmazonClientException | InterruptedException e) {
System.out.println("Uh oh, this could be a bug");
}
}
Possible Solution
When increasing the threshold size, we see the file upload successfully, however this is non-ideal in our use case.
Additional Information/Context
We did a bit of a dig around in the AWS SDK code base and found the following.
AWS checks the setAlwaysCalculateMultipartMd5 flag here https://github.com/aws/aws-sdk-java/blob/81b400b76c7b0a605a7e0767a2025e0f892e363b/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/UploadCallable.java#L365-L365 while performing a multi-part upload
However, that execution path is taken if uploading parts in parallel https://github.com/aws/aws-sdk-java/blob/81b400b76c7b0a605a7e0767a2025e0f892e363b/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/UploadCallable.java#L223-L223
Unfortunately, because we use encryption, parallel part upload is not supported and uploadPartsInSeries() https://github.com/aws/aws-sdk-java/blob/81b400b76c7b0a605a7e0767a2025e0f892e363b/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/UploadCallable.java#L226-L226 gets executed. Sadly, that path doesn’t compute/include the MD5 hash and hence the error
AWS Java SDK version used
1.12.279
JDK version used
openjdk version "1.8.0_292"
Operating System and version
Debian 9