botocore icon indicating copy to clipboard operation
botocore copied to clipboard

Using Generated Presigned URLs with CRC32C checksums results in 400 from S3

Open richardnpaul opened this issue 1 year ago • 0 comments

Describe the bug

When trying to upload a large object to S3 using the multipart upload process with presigned urls with crc32c checksums the response from S3 is a 400 error with an error message.

Expected Behavior

I would expect that the provided checksum headers would be expected and so the type would be the checksum type not a type of null which would then mean that the upload to S3 would succeed.

Current Behavior

The following type of error message is returned instead of success:

Failed to upload part, status: 400, response: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>Checksum Type mismatch occurred, expected checksum Type: null, actual checksum Type: crc32c</Message><RequestId>SOMEREQID</RequestId><HostId>SOME/HOSTID</HostId></Error>

Reproduction Steps

Change all the AWS credentials for valid values for your testing and provide a file on the testfile assignment line (I was using a path in ~/Downloads/)

#!/usr/bin/env python3
import base64
import pathlib
from zlib import crc32

import boto3
import requests


# AWS credentials
access_key_id = 'access_key_here'
secret_access_key = 'secret_key_here'
aws_session_token = 'session_token_here'
region = 'region_here'
bucket_name = 'bucket_name_here'
object_key = 'prefix_here/object_key_here'

# Create a session using your AWS credentials
session = boto3.Session(
    aws_access_key_id=access_key_id,
    aws_secret_access_key=secret_access_key,
    aws_session_token=aws_session_token,
)

# Create an S3 client with the specified region
s3_client = session.client('s3', region_name=region)

# Initialize a multipart upload
response = s3_client.create_multipart_upload(
    Bucket=bucket_name,
    Key=object_key
)
upload_id = response['UploadId']

part_number = 1
chunk_size = 10 * 1024 * 1024  # 10 MB

testfile = pathlib.Path('file 10MB or greater in size here').expanduser()

with open(testfile, 'rb') as f:
    content = f.read(chunk_size)

# Calculate ChecksumCRC32C   (I'm not 100% certain about this as we use the crc32c package normally)
checksum_crc32c = base64.b64encode(crc32(content).to_bytes(4, byteorder='big')).decode('utf-8')

# Generate the presigned URL
presigned_url = s3_client.generate_presigned_url(
    'upload_part',
    Params={
        'Bucket': bucket_name,
        'Key': object_key,
        'PartNumber': part_number,
        'UploadId': upload_id,
        'ChecksumCRC32C': checksum_crc32c
    },
    ExpiresIn=3600
)

headers = {
    'Content-Length': str(len(content)),
    'x-amz-checksum-crc32c': checksum_crc32c,
    'Content-Type': 'application/octet-stream',
}

response = requests.put(presigned_url, data=content, headers=headers)

if response.status_code == 200:
    print("Part uploaded successfully!")
else:
    print(f"Failed to upload part, status: {response.status_code}, response: {response.text}")

Possible Solution

I feel like the checksum header is not being passed to be included in the signing process but to be honest I got a bit lost in the library's code and couldn't make head nor tail of it in the end.

Additional Information/Context

Docs page for generating the urls: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/generate_presigned_url.html Docs page with acceptable params to be passed to generate_presigned_url when using upload_part as the ClientMethod: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_part.html

SDK version used

1.34.138

Environment details (OS name and version, etc.)

Ubuntu 22.04.4, Python 3.10.12

richardnpaul avatar Jul 03 '24 15:07 richardnpaul