aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

Cannot upload large files using Multipart to an S3 bucket with both Object Locking and Encryption Enabled.

Open jfleming-ic opened this issue 3 years ago • 0 comments

Describe the bug

We are attempting to upload large files to an object locked bucket that has encryption enabled. We use multipart uploads and encryption with an object locked bucket. We set the following configuration on the transferManager

setAlwaysCalculateMultipartMd5(true)

However we receive the following stacktrace when attempting to upload:

com.amazonaws.services.s3.model.AmazonS3Exception: Content-MD5 OR x-amz-checksum- HTTP header is required for Put Part requests with Object Lock parameters (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID:; S3 Extended Request ID:; Proxy: null) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3887) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3872) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:323) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:226) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:147) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:115) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:45) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) Sep 19 01:30:35 ip-10-66-172-138 cassandra-backup[77160]: at java.base/java.lang.Thread.run(Thread.java:829)

Expected Behavior

Uploads should respect the following configuration when uploading parts in parallel when encryption is enabled.

transferManager.getConfiguration().setAlwaysCalculateMultipartMd5(true)

Uploads using multi part should work on an S3 bucket with both encryption and object locking.

Current Behavior

We are required to increase our multipart thresh hold to prevent multipart upload from being executed, which causes a steep performance penalty on large uploads

Reproduction Steps

1 - Create an Object locked S3 bucket with encryption enabled. 2 - Create a file with 50MB of content at /my-mega-file.txt. 3 - The following code should trigger the issue (I don't think I can share our actual production code that's hitting this, so this is a rough mockup of it)

public static void main(String[] args) throws IOException {

    String bucketName = "my-bucket-for-aws";
    String keyName = "my-file";
    File file = new File("/my-mega-file.txt");
    String kmsKeyId = "some-key-id";

    AWSKMS kmsClient = AWSKMSClientBuilder.standard()
            .withRegion(Regions.DEFAULT_REGION)
            .build();

    AmazonS3EncryptionV2 s3Encryption = AmazonS3EncryptionClientV2Builder.standard()
            .withRegion(Regions.DEFAULT_REGION)
            .withKmsClient(kmsClient)
            .withCryptoConfiguration(new CryptoConfigurationV2().withCryptoMode((CryptoMode.AuthenticatedEncryption)))
            .withEncryptionMaterialsProvider(new KMSEncryptionMaterialsProvider(kmsKeyId))
            .build();

    TransferManager tm = TransferManagerBuilder.standard()
            .withS3Client(s3Encryption)
            .withAlwaysCalculateMultipartMd5(true)
            .build();

    Upload upload = tm.upload(bucketName, keyName, file);

    try {
        upload.waitForCompletion();
    } catch (AmazonClientException | InterruptedException e) {
        System.out.println("Uh oh, this could be a bug");
    }
}

Possible Solution

When increasing the threshold size, we see the file upload successfully, however this is non-ideal in our use case.

Additional Information/Context

We did a bit of a dig around in the AWS SDK code base and found the following.

AWS checks the setAlwaysCalculateMultipartMd5 flag here https://github.com/aws/aws-sdk-java/blob/81b400b76c7b0a605a7e0767a2025e0f892e363b/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/UploadCallable.java#L365-L365 while performing a multi-part upload

However, that execution path is taken if uploading parts in parallel https://github.com/aws/aws-sdk-java/blob/81b400b76c7b0a605a7e0767a2025e0f892e363b/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/UploadCallable.java#L223-L223

Unfortunately, because we use encryption, parallel part upload is not supported and uploadPartsInSeries() https://github.com/aws/aws-sdk-java/blob/81b400b76c7b0a605a7e0767a2025e0f892e363b/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/UploadCallable.java#L226-L226 gets executed. Sadly, that path doesn’t compute/include the MD5 hash and hence the error

AWS Java SDK version used

1.12.279

JDK version used

openjdk version "1.8.0_292"

Operating System and version

Debian 9

jfleming-ic avatar Sep 20 '22 07:09 jfleming-ic