aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

Seems cli does not verify the checksum when downloading an object

Open buptwxd2 opened this issue 3 years ago • 13 comments

Confirm by changing [ ] to [x] below:

Issue is about usage on:

  • [ ] Service API : I want to do X using Y service, what should I do?
  • [ ] CLI : passing arguments or cli configurations.
  • [x] Other/Not sure.

Platform/OS/Hardware/Device What are you running the cli on?

Describe the question According to "AWS S3 CLI FAQ"(https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html#cli-aws-help-s3-faq), the aws cli tool will try to erify the checksum of downloads when possible. However, i tried to corrupt an object and used the aws cli to download the corrupted object, it succeeded without detecting the mismatch.

Steps

  1. preapre a local file, size:4MB witih all zero

  2. upload the fille to s3 using aws cli aws s3api put-object --bucket xd-bk-1 --key test --body 4M_zero root@vm102 ~/x/test# aws s3api head-object --bucket xd-bk-1 --key test { "AcceptRanges": "bytes", "LastModified": "2022-02-11T07:14:36+00:00", "ContentLength": 4194304, "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"", "ContentType": "binary/octet-stream", "Metadata": {}, "StorageClass": "STANDARD" }

  3. corrupt the object

  4. Download the object root@vm102 ~/x/test# aws s3api get-object --bucket xd-bk-1 --key test d_test { "AcceptRanges": "bytes", "LastModified": "2022-02-11T07:14:36+00:00", "ContentLength": 4194304, "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"", "ContentType": "binary/octet-stream", "Metadata": {}, "StorageClass": "STANDARD" } root@vm102 ~/x/test# md5sum d_test 6c8b11cda139dbb04a83190975220d98 d_test

As a comparison, the s3cmd tool detected the mistach as below root@vm102 ~/x/test [64]# s3cmd get s3://xd-bk-1/test dtest download: 's3://xd-bk-1/test' -> 'dtest' [1 of 1] 4194304 of 4194304 100% in 0s 168.38 MB/s done WARNING: MD5 signatures do not match: computed=6c8b11cda139dbb04a83190975220d98, received=b5cfa9d6c8febd618f91ac2843d50a1c

Logs/output Get full traceback and error logs by adding --debug to the command.

buptwxd2 avatar Feb 11 '22 08:02 buptwxd2

Hi @buptwxd2,

Thanks for your post. Can you provide more details as to how you corrupted the object in step 3?

kdaily avatar Feb 14 '22 18:02 kdaily

Hi @kdaily , i am using the open-souce Ceph project for testing which is compatible with AWS S3. So i could use the internal way to overwrite the backend data, hence corrupt the data.

Here i want to double check if aws cli could check the data integrity as claimed in the FAQ. As a comparison, the s3cmd tool could detect the E-tag mistach.

It would be great if aws cli could support this behavior.

buptwxd2 avatar Feb 15 '22 04:02 buptwxd2

Hi @kdaily,any update on this thread?

Thanks

buptwxd2 avatar Feb 18 '22 10:02 buptwxd2

@buptwxd2,

The documentation you referred to is for the high level aws s3 commands, not for the low level aws s3api commands. The aws s3api commands you are using are directly from the AWS S3 API, and no check of content based on MD5 is computed for a download. For uploads using PutObject via aws s3api put-object, an MD5 check is performed, as noted in the documentation.

If you are using aws s3 cp or aws s3 sync to transfer from S3 to a local file storage, then except in the cases outlined an MD5 check is performed. These operations are the closest comparison to s3cmd. For example, if the object was uploaded via multipart uploads, there is no MD5 for the entire object, MD5s are only checked on each part. If you want to be able to check the MD5 of the entire object, you would need to set this on the object metadata yourself.

I hope this answers your questions!

kdaily avatar Feb 18 '22 20:02 kdaily

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Feb 18 '22 20:02 github-actions[bot]

Thanks @kdaily . So the high level "aws s3 cp" should check the MD5 based on your response? I tried "aws s3 cp" command but still no mismatch detected.

buptwxd2 avatar Feb 19 '22 02:02 buptwxd2

Hi @buptwxd2,

Can you please provide debug logs (add --debug to your command) showing what happens in this case? Please redact any sensitive information. Thanks!

Edit: I'm also reviewing the documentation to confirm it's valid.

kdaily avatar Feb 23 '22 18:02 kdaily

Hi @kdaily

Please see the attached file for the detailed logs. I corrupted the object 4M and used "aws s3 cp" to download the object.

root@sds5 ~/x/test# aws s3api head-object --bucket xd-bk-2 --key 4M { "AcceptRanges": "bytes", "LastModified": "2022-02-24T10:02:43+00:00", "ContentLength": 4194304, "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"", "ContentType": "binary/octet-stream", "Metadata": {}, "StorageClass": "STANDARD" } root@sds5 ~/x/test# aws s3 cp s3://xd-bk-2/4M d_4M download: s3://xd-bk-2/4M to ./d_4M root@sds5 ~/x/test# md5sum d_4M 4ad9688f6ce9fe176dcfecf94f96e635 d_4M log.txt

buptwxd2 avatar Feb 24 '22 13:02 buptwxd2

Hi @kdaily , any update on this thread?

Thanks.

buptwxd2 avatar Mar 01 '22 06:03 buptwxd2

Hi @buptwxd2,

Still looking into this. Thanks for your patience.

kdaily avatar Mar 05 '22 00:03 kdaily

Hi @buptwxd2,

Thanks for your patience. It seems that this functionality was not migrated when the AWS CLI started using the s3transfer implementation. I'm going to update the docs so that they are current, and we will explore what next steps there would be.

kdaily avatar Mar 08 '22 19:03 kdaily

HI @kdaily ,

Do we hava a conclusion on this issue? Will it be supported to verify the checksum when downloading an object?

Thanks a lot.

buptwxd2 avatar May 28 '22 07:05 buptwxd2

Yes, aws cli doesn't do checksum validation in case of download.

I have Verified with High level Api and Low Level Api of aws cli by enabling debug mode but I can't see any where it doing checksum validation.

High Level Api : aws s3 cp /path/to/file s3://bucket/object-key Note: s3 is high level api of aws-cli

Low Level Api : aws s3api get-object --bucket bucket-name --key object-key /path/to/file Note: s3api is low level api of aws-cli

shaiksuhel1999 avatar Dec 13 '23 04:12 shaiksuhel1999