s3transfer
s3transfer copied to clipboard
`S3.Object.copy()` fails with multipart and `ChecksumAlgorithm`
Describe the bug
If you try to copy an object with multipart and create checksums for the destination object it will fail.
Note: using copy
with small objects doesn't fail, it does fail in objects whose size is above multipart_threshold
.
Note 2: using copy
from s3 client has the same effect.
Expected Behavior
copy
method should work for multipart object with checksums.
Current Behavior
Running copy with multipart and ChecksumAlgorithm
set to SHA256
throws the message:
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CompleteMultipartUpload operation: The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It was missing for part 1 in the request.
Reproduction Steps
Run the following code replacing bucket and key accordingly
import boto3
s3= boto3.resource("s3")
dest_bucket = "bucket"
dest_key = "key"
copy_source = {"Bucket": "bucket", "Key": "key"}
s3.Object(dest_bucket, dest_key).copy(
CopySource=copy_source,
ExtraArgs={"ChecksumAlgorithm": "SHA256"}
)
and you'll get the following error Full traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".venv/lib/python3.7/site-packages/boto3/s3/inject.py", line 572, in object_copy
Config=Config,
File ".venv/lib/python3.7/site-packages/boto3/s3/inject.py", line 444, in copy
return future.result()
File ".venv/lib/python3.7/site-packages/s3transfer/futures.py", line 103, in result
return self._coordinator.result()
File ".venv/lib/python3.7/site-packages/s3transfer/futures.py", line 266, in result
raise self._exception
File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 139, in __call__
return self._execute_main(kwargs)
File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 162, in _execute_main
return_value = self._main(**kwargs)
File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 387, in _main
**extra_args,
File ".venv/lib/python3.7/site-packages/botocore/client.py", line 508, in _api_call
return self._make_api_call(operation_name, kwargs)
File ".venv/lib/python3.7/site-packages/botocore/client.py", line 911, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CompleteMultipartUpload operation: The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It was missing for part 1 in the request.
Possible Solution
The problem resides inside CopyPartTask
(s3transfer/copies.py
) which doesn't return the checksum if it's in the response, grabbing the checksum from the request and adding it to the return statement fixes this issue
Additional Information/Context
Looking at the CompleteMultipartUpload request looks like it sends ETAG
for each part, but not checksum.
Lib version:
- botocore==1.27.1
- boto3==1.24.1
- s3transfer==0.6.0
SDK version used
1.24.1
Environment details (OS name and version, etc.)
MacOS 12.4 Python 3.7
Hi @rapkyt, I'm a developer who's also using this copy
method. If we do not specify ChecksumAlgorithm
, does copy
not perform checksum validation for multipart uploads? Or is there a default algorithm when there's none specified?
s3.Object(dest_bucket, dest_key).copy(
CopySource=copy_source,
# ExtraArgs={"ChecksumAlgorithm": "SHA256"} # What's the behavior for multipart uploads without this line?
)
@boonjiashen, If you don't specify ChecksumAlgorithm
then your s3 object will not have additional checksums. AFIK boto does some sort of checksum validation with the Etags already.
One thing that cough my mind is that, when uploading or copying a multipart object, the checksum of the s3 object is not the checksum of the whole s3 file, but rather, the checksum of concatenating the checksums of each part.
Note: A PR has been raised to solve this issue. https://github.com/boto/s3transfer/pull/242