s3transfer icon indicating copy to clipboard operation
s3transfer copied to clipboard

`S3.Object.copy()` fails with multipart and `ChecksumAlgorithm`

Open rapkyt opened this issue 2 years ago • 3 comments

Describe the bug

If you try to copy an object with multipart and create checksums for the destination object it will fail.

Note: using copy with small objects doesn't fail, it does fail in objects whose size is above multipart_threshold. Note 2: using copy from s3 client has the same effect.

Expected Behavior

copy method should work for multipart object with checksums.

Current Behavior

Running copy with multipart and ChecksumAlgorithm set to SHA256 throws the message:

botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CompleteMultipartUpload operation: The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It was missing for part 1 in the request.

Reproduction Steps

Run the following code replacing bucket and key accordingly

import boto3
s3= boto3.resource("s3")

dest_bucket = "bucket"
dest_key = "key"
copy_source = {"Bucket": "bucket", "Key": "key"}

s3.Object(dest_bucket, dest_key).copy(
    CopySource=copy_source,
    ExtraArgs={"ChecksumAlgorithm": "SHA256"}
)

and you'll get the following error Full traceback:

 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File ".venv/lib/python3.7/site-packages/boto3/s3/inject.py", line 572, in object_copy
     Config=Config,
   File ".venv/lib/python3.7/site-packages/boto3/s3/inject.py", line 444, in copy
     return future.result()
   File ".venv/lib/python3.7/site-packages/s3transfer/futures.py", line 103, in result
     return self._coordinator.result()
   File ".venv/lib/python3.7/site-packages/s3transfer/futures.py", line 266, in result
     raise self._exception
   File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 139, in __call__
     return self._execute_main(kwargs)
   File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 162, in _execute_main
     return_value = self._main(**kwargs)
   File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 387, in _main
     **extra_args,
   File ".venv/lib/python3.7/site-packages/botocore/client.py", line 508, in _api_call
     return self._make_api_call(operation_name, kwargs)
   File ".venv/lib/python3.7/site-packages/botocore/client.py", line 911, in _make_api_call
     raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CompleteMultipartUpload operation: The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It was missing for part 1 in the request.

Possible Solution

The problem resides inside CopyPartTask (s3transfer/copies.py) which doesn't return the checksum if it's in the response, grabbing the checksum from the request and adding it to the return statement fixes this issue

Additional Information/Context

Looking at the CompleteMultipartUpload request looks like it sends ETAG for each part, but not checksum.

Lib version:

  • botocore==1.27.1
  • boto3==1.24.1
  • s3transfer==0.6.0

SDK version used

1.24.1

Environment details (OS name and version, etc.)

MacOS 12.4 Python 3.7

rapkyt avatar Jun 07 '22 13:06 rapkyt

Hi @rapkyt, I'm a developer who's also using this copy method. If we do not specify ChecksumAlgorithm, does copy not perform checksum validation for multipart uploads? Or is there a default algorithm when there's none specified?

s3.Object(dest_bucket, dest_key).copy(
    CopySource=copy_source,
    # ExtraArgs={"ChecksumAlgorithm": "SHA256"}  # What's the behavior for multipart uploads without this line?
)

boonjiashen avatar Jun 23 '22 22:06 boonjiashen

@boonjiashen, If you don't specify ChecksumAlgorithm then your s3 object will not have additional checksums. AFIK boto does some sort of checksum validation with the Etags already.

One thing that cough my mind is that, when uploading or copying a multipart object, the checksum of the s3 object is not the checksum of the whole s3 file, but rather, the checksum of concatenating the checksums of each part.

rapkyt avatar Jun 24 '22 20:06 rapkyt

Note: A PR has been raised to solve this issue. https://github.com/boto/s3transfer/pull/242

sat-ch avatar Oct 20 '22 20:10 sat-ch