boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Invalid extra_args key 'ChecksumSHA256' for s3_client.upload_file()

Open svbfromnl opened this issue 2 years ago • 2 comments

Describe the bug

I am attempting to s3_client.upload_file() a file with a pre-calculated SHA256 hash to make use of the S3 feature that will calculate it's own hash and compare it to what was submitted with the upload.

Expected Behavior

I expected the upload to succeed.

Current Behavior

I get the following error:

Invalid extra_args key 'ChecksumSHA256', must be one of: ACL, CacheControl, ChecksumAlgorithm, ContentDisposition, ContentEncoding, ContentLanguage, ContentType, ExpectedBucketOwner, Expires, GrantFullControl, GrantRead, GrantReadACP, GrantWriteACP, Metadata, ObjectLockLegalHoldStatus, ObjectLockMode, ObjectLockRetainUntilDate, RequestPayer, ServerSideEncryption, StorageClass, SSECustomerAlgorithm, SSECustomerKey, SSECustomerKeyMD5, SSEKMSKeyId, SSEKMSEncryptionContext, Tagging, WebsiteRedirectLocation

Reproduction Steps

#python

s3_client = session.client('s3')
s3_client.upload_file(f, bucket, filename, ExtraArgs={
                "ChecksumAlgorithm": "sha256", "ChecksumSHA256": digest,})

Possible Solution

Add ChecksumSHA256 as an allowable argument to be passed through to S3

Additional Information/Context

https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

SDK version used

1.29.63

Environment details (OS name and version, etc.)

macOS Ventura 13.2.1

svbfromnl avatar Feb 23 '23 15:02 svbfromnl

Hi @svbfromnl - thanks for reaching out. I think this is what you're looking for ExtraArgs={"ChecksumAlgorithm": "SHA256"} All of the allowed ExtraArgs are listed here for future reference. Hope that helps, John

aBurmeseDev avatar Feb 28 '23 01:02 aBurmeseDev

That sole argument works, but it doesn't do everything I need to. This argument will tell AWS to compute the hash and store it with the object. But the functionality I am looking for is the ability to self-submit a hash along with that argument. According to AWS, it will then compare the self-submitted hash with the hash it computes itself, and reject any uploads if the values don't match. That functionality is implemented by submitting the {"ChecksumSHA256": digest} k/v pair.

svbfromnl avatar Feb 28 '23 14:02 svbfromnl

I assume this cannot be used in upload_file, because it very quickly jumps to multi-part uploads, which make checksum calculation on client-side difficult due to the hash becoming a composite hash.

Only the low-level put_object request supports this.

straygar avatar Jan 15 '25 15:01 straygar

Hey @svbfromnl,

We've added support for providing full-object checksums in s3transfer-0.11.x, which is used in the latest versions of the AWS Python SDK and AWS CLI.

You should now be able to achieve what you were trying before:

import boto3
import io
from boto3.s3.transfer import TransferConfig, MB


TEST_BUCKET = "aws-example-bucket"
TEST_KEY = "aws-example-file.txt"

client = boto3.client("s3")
transfer_config = TransferConfig(
    multipart_threshold=8 * MB, # This is the default value, you can change this as needed.
)

response = client.upload_fileobj(
    Fileobj=io.BytesIO(b"Hello, World!"),
    Bucket=TEST_BUCKET,
    Key=TEST_KEY,
    Config=transfer_config,
    ExtraArgs={
        "ChecksumSHA256": "3/1gIbsr1bCvZ2KQgJ7DpTGR3YHH9wpLKGiKNiGCmG8="
    }
)

One thing I'd warn specifically around SHA algorithms is S3 doesn't support uploading a full-checksum algorithm when doing a MPU. See the related docs below from Checking object integrity in Amazon S3.

Multipart uploads When you upload the object in multiple parts using the MultipartUpload API, you can specify the checksum algorithm that you want Amazon S3 to use and the checksum type (full object or composite). The following table indicates which checksum algorithm type is supported for each checksum algorithm in a multipart upload:

Checksum algorithm Full object Composite
CRC-64NVME Yes No
CRC-32 Yes Yes
CRC-32C Yes Yes
SHA-1 No Yes
SHA-256 No Yes

High-level operations in boto3 such as upload_file and upload_fileobj will automatically determine to use a MPU upload if the size of you object is larger than the configured multipart_threshold (8MB is the default value).

As a result, if you attempt to upload a full-object SHA checksum using MPU, you'll receive the following error:

botocore.errorfactory.InvalidRequest: An error occurred (InvalidRequest) when calling the CreateMultipartUpload operation: The FULL_OBJECT checksum type cannot be used with the sha256 checksum algorithm.

Let me know if you have any questions!

jonathan343 avatar Jan 31 '25 19:01 jonathan343