Using Generated Presigned URLs with CRC32C checksums results in 400 from S3
Describe the bug
When trying to upload a large object to S3 using the multipart upload process with presigned urls with crc32c checksums the response from S3 is a 400 error with an error message.
Expected Behavior
I would expect that the provided checksum headers would be expected and so the type would be the checksum type not a type of null which would then mean that the upload to S3 would succeed.
Current Behavior
The following type of error message is returned instead of success:
Failed to upload part, status: 400, response: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>Checksum Type mismatch occurred, expected checksum Type: null, actual checksum Type: crc32c</Message><RequestId>SOMEREQID</RequestId><HostId>SOME/HOSTID</HostId></Error>
Reproduction Steps
Change all the AWS credentials for valid values for your testing and provide a file on the testfile assignment line (I was using a path in ~/Downloads/)
#!/usr/bin/env python3
import base64
import pathlib
from zlib import crc32
import boto3
import requests
# AWS credentials
access_key_id = 'access_key_here'
secret_access_key = 'secret_key_here'
aws_session_token = 'session_token_here'
region = 'region_here'
bucket_name = 'bucket_name_here'
object_key = 'prefix_here/object_key_here'
# Create a session using your AWS credentials
session = boto3.Session(
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
aws_session_token=aws_session_token,
)
# Create an S3 client with the specified region
s3_client = session.client('s3', region_name=region)
# Initialize a multipart upload
response = s3_client.create_multipart_upload(
Bucket=bucket_name,
Key=object_key
)
upload_id = response['UploadId']
part_number = 1
chunk_size = 10 * 1024 * 1024 # 10 MB
testfile = pathlib.Path('file 10MB or greater in size here').expanduser()
with open(testfile, 'rb') as f:
content = f.read(chunk_size)
# Calculate ChecksumCRC32C (I'm not 100% certain about this as we use the crc32c package normally)
checksum_crc32c = base64.b64encode(crc32(content).to_bytes(4, byteorder='big')).decode('utf-8')
# Generate the presigned URL
presigned_url = s3_client.generate_presigned_url(
'upload_part',
Params={
'Bucket': bucket_name,
'Key': object_key,
'PartNumber': part_number,
'UploadId': upload_id,
'ChecksumCRC32C': checksum_crc32c
},
ExpiresIn=3600
)
headers = {
'Content-Length': str(len(content)),
'x-amz-checksum-crc32c': checksum_crc32c,
'Content-Type': 'application/octet-stream',
}
response = requests.put(presigned_url, data=content, headers=headers)
if response.status_code == 200:
print("Part uploaded successfully!")
else:
print(f"Failed to upload part, status: {response.status_code}, response: {response.text}")
Possible Solution
I feel like the checksum header is not being passed to be included in the signing process but to be honest I got a bit lost in the library's code and couldn't make head nor tail of it in the end.
Additional Information/Context
Docs page for generating the urls:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/generate_presigned_url.html
Docs page with acceptable params to be passed to generate_presigned_url when using upload_part as the ClientMethod:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_part.html
SDK version used
1.34.138
Environment details (OS name and version, etc.)
Ubuntu 22.04.4, Python 3.10.12