aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

Support S3 additional checksums in high-level S3 commands

Open m-radzikowski opened this issue 2 years ago • 16 comments

Is your feature request related to a problem? Please describe.

Newly released additional S3 checksums feature enhances the SDKs operations by calculating selected checksum value on file upload. This also includes multipart upload. However, this new feature is not present in the high-level S3 commands.

Describe the solution you'd like

--checksum-algorithm parameter in the aws s3 commands, especially in the aws s3 cp.

Describe alternatives you've considered

Using low-level commands.

m-radzikowski avatar Feb 26 '22 08:02 m-radzikowski

Hi @m-radzikowski thanks for the feature request. There has already been some discussion on the team about how these checksums could enhance commands like aws s3 cp and aws s3 sync. But it will take more time and discussion to think through the implementation. In the meantime we can leave this issue open to track the request.

tim-finnigan avatar Mar 01 '22 00:03 tim-finnigan

This would be a useful addition to the high-level commands. For reference here is a solution using s3api: Would have been nice if MD5 digests were included as an option.

# aws-cli version 2.7.16
# https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/

# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body <file_name> --checksum-algorithm crc32 --bucket <bucket_name> --key <key_name>

# retrieve the checksum```
# ChecksumCRC32 ChecksumCRC32C ChecksumSHA1 ChecksumSHA256
aws s3api head-object --bucket <bucket_name> --key <key_name> --checksum-mode Enabled --Query ChecksumCRC32 --output text

rajivnarayan avatar Jul 26 '22 20:07 rajivnarayan

+1 to support for checksums when syncing.

jonathansampson avatar Nov 04 '22 14:11 jonathansampson

+1 on this feature.

saksham avatar Jan 24 '23 22:01 saksham

+1

genvidkyle avatar Feb 12 '23 23:02 genvidkyle

I've got a client migrating a small but critical dataset to S3, and they have strict requirements for data integrity validation. With checksum support missing from the S3 sync higher-level command, we expect an increased effort to meet the client's requirements. This is a significant gap as far as missing functionality goes.

jbutz avatar Mar 16 '23 13:03 jbutz

+1

ashepherd avatar Jun 09 '23 13:06 ashepherd

+1

MaksymSimchuk-prxt avatar Jun 20 '23 12:06 MaksymSimchuk-prxt

+1

khilnani avatar Jun 28 '23 04:06 khilnani

+1

animeshsg avatar Jun 30 '23 15:06 animeshsg

@rajivnarayan i stumbled upon this. The sha256 checksum value being returned from aws doesn't seem to be right. Additionally, as per the cli doc, --checksum-algorithm param is only supported when using SDK. Have you faced issues about the sha256 value not being rightly calculated?

sarthakjain271095 avatar Jul 03 '23 19:07 sarthakjain271095

Works fine for me with the latest aws cli (2.12.16). Note that the checksum in base64 encoded as detailed here: https://aws.amazon.com/getting-started/hands-on/amazon-s3-with-additional-checksums/?ref=docs_gateway/amazons3/checking-object-integrity.html

BUCKET=my-test-bucket
KEY=hello_checksum.txt
echo "Hello world!" > hello.txt

# Compute base64 encoded sha256
shasum -a 256 hello.txt|cut -f1 -d\ |xxd -r -p|base64
# C6kE6uh3O3DHUzPbTeLzrEWorU3bobJC8LPPwZk5Hdg=

# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body hello.txt --checksum-algorithm sha256 --bucket
${BUCKET} --key ${KEY}

# retrieve the checksum
aws s3api head-object --bucket ${BUCKET} --key ${KEY} --checksum-mode
Enabled --query ChecksumSHA256 --output text
# C6kE6uh3O3DHUzPbTeLzrEWorU3bobJC8LPPwZk5Hdg=

On Mon, Jul 3, 2023 at 3:02 PM Sarthak Jain @.***> wrote:

@rajivnarayan https://github.com/rajivnarayan i stumbled upon this https://github.com/aws/aws-cli/issues/6750#issuecomment-1195959947. The sha256 checksum value being returned from aws doesn't seem to be right. Additionally, as per the cli doc https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html, --checksum-algorithm param is only supported when using SDK. Have you faced issues about the sha256 value not being rightly calculated?

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-cli/issues/6750#issuecomment-1619016582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXS2BUKAMDHSLJUEXCHLPDXOMJLTANCNFSM5PMLKCAQ . You are receiving this because you were mentioned.Message ID: @.***>

rajivnarayan avatar Jul 03 '23 21:07 rajivnarayan

But it will take more time and discussion to think through the implementation.

@tim-finnigan could you perhaps elaborate on what the key problems are with adding checksum support to the s3 commands? As it is supported by the low-level s3api commands I'd expect that support in the high-level commands is straight forward. Other libraries such as boto3 support s3 based checksum computation in their high level API functions (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS).

dpeger avatar Aug 09 '23 13:08 dpeger

I believe that most of use-cases are probably using high-level s3 command s3 cp or sync. can we have more information to think through the implementation?

Park-minkyu avatar Oct 30 '23 09:10 Park-minkyu

I do support the changes in high level implementation aws s3 sync command however this feature should be disabled temporary when it is not being fixed at the moment. We have no idea when will this "new" feature exist (the thread had been 1 year plus) but the "sync" command is misleading the user that they have "sychronized" the files while it is not always the case. It may caused the financial lost to the company if the "wrong" object had been synchronized. I am forced to do the workaround to fix this aws s3 sync issue to ensure the "different md5sum with same file size" file being uploaded (skipped using aws s3 sync at the moment).

YoongLoong avatar Dec 11 '23 03:12 YoongLoong

May I have the update on the aws sync bug issue? This is causing a lot of inconvenience to sync the file(s) from AWS S3 now.

YoongLoong avatar Feb 23 '24 03:02 YoongLoong