aws-cli
aws-cli copied to clipboard
Seems cli does not verify the checksum when downloading an object
Confirm by changing [ ] to [x] below:
- [ x] I've gone though the User Guide and the API reference
- [ x] I've searched for previous similar issues and didn't find any solution
Issue is about usage on:
- [ ] Service API : I want to do X using Y service, what should I do?
- [ ] CLI : passing arguments or cli configurations.
- [x] Other/Not sure.
Platform/OS/Hardware/Device What are you running the cli on?
Describe the question According to "AWS S3 CLI FAQ"(https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html#cli-aws-help-s3-faq), the aws cli tool will try to erify the checksum of downloads when possible. However, i tried to corrupt an object and used the aws cli to download the corrupted object, it succeeded without detecting the mismatch.
Steps
-
preapre a local file, size:4MB witih all zero
-
upload the fille to s3 using aws cli aws s3api put-object --bucket xd-bk-1 --key test --body 4M_zero root@vm102 ~/x/test# aws s3api head-object --bucket xd-bk-1 --key test { "AcceptRanges": "bytes", "LastModified": "2022-02-11T07:14:36+00:00", "ContentLength": 4194304, "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"", "ContentType": "binary/octet-stream", "Metadata": {}, "StorageClass": "STANDARD" }
-
corrupt the object
-
Download the object root@vm102 ~/x/test# aws s3api get-object --bucket xd-bk-1 --key test d_test { "AcceptRanges": "bytes", "LastModified": "2022-02-11T07:14:36+00:00", "ContentLength": 4194304, "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"", "ContentType": "binary/octet-stream", "Metadata": {}, "StorageClass": "STANDARD" } root@vm102 ~/x/test# md5sum d_test 6c8b11cda139dbb04a83190975220d98 d_test
As a comparison, the s3cmd tool detected the mistach as below root@vm102 ~/x/test [64]# s3cmd get s3://xd-bk-1/test dtest download: 's3://xd-bk-1/test' -> 'dtest' [1 of 1] 4194304 of 4194304 100% in 0s 168.38 MB/s done WARNING: MD5 signatures do not match: computed=6c8b11cda139dbb04a83190975220d98, received=b5cfa9d6c8febd618f91ac2843d50a1c
Logs/output
Get full traceback and error logs by adding --debug to the command.
Hi @buptwxd2,
Thanks for your post. Can you provide more details as to how you corrupted the object in step 3?
Hi @kdaily , i am using the open-souce Ceph project for testing which is compatible with AWS S3. So i could use the internal way to overwrite the backend data, hence corrupt the data.
Here i want to double check if aws cli could check the data integrity as claimed in the FAQ. As a comparison, the s3cmd tool could detect the E-tag mistach.
It would be great if aws cli could support this behavior.
Hi @kdaily,any update on this thread?
Thanks
@buptwxd2,
The documentation you referred to is for the high level aws s3 commands, not for the low level aws s3api commands. The aws s3api commands you are using are directly from the AWS S3 API, and no check of content based on MD5 is computed for a download. For uploads using PutObject via aws s3api put-object, an MD5 check is performed, as noted in the documentation.
If you are using aws s3 cp or aws s3 sync to transfer from S3 to a local file storage, then except in the cases outlined an MD5 check is performed. These operations are the closest comparison to s3cmd. For example, if the object was uploaded via multipart uploads, there is no MD5 for the entire object, MD5s are only checked on each part. If you want to be able to check the MD5 of the entire object, you would need to set this on the object metadata yourself.
I hope this answers your questions!
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Thanks @kdaily . So the high level "aws s3 cp" should check the MD5 based on your response? I tried "aws s3 cp" command but still no mismatch detected.
Hi @buptwxd2,
Can you please provide debug logs (add --debug to your command) showing what happens in this case? Please redact any sensitive information. Thanks!
Edit: I'm also reviewing the documentation to confirm it's valid.
Hi @kdaily
Please see the attached file for the detailed logs. I corrupted the object 4M and used "aws s3 cp" to download the object.
root@sds5 ~/x/test# aws s3api head-object --bucket xd-bk-2 --key 4M { "AcceptRanges": "bytes", "LastModified": "2022-02-24T10:02:43+00:00", "ContentLength": 4194304, "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"", "ContentType": "binary/octet-stream", "Metadata": {}, "StorageClass": "STANDARD" } root@sds5 ~/x/test# aws s3 cp s3://xd-bk-2/4M d_4M download: s3://xd-bk-2/4M to ./d_4M root@sds5 ~/x/test# md5sum d_4M 4ad9688f6ce9fe176dcfecf94f96e635 d_4M log.txt
Hi @kdaily , any update on this thread?
Thanks.
Hi @buptwxd2,
Still looking into this. Thanks for your patience.
Hi @buptwxd2,
Thanks for your patience. It seems that this functionality was not migrated when the AWS CLI started using the s3transfer implementation. I'm going to update the docs so that they are current, and we will explore what next steps there would be.
HI @kdaily ,
Do we hava a conclusion on this issue? Will it be supported to verify the checksum when downloading an object?
Thanks a lot.
Yes, aws cli doesn't do checksum validation in case of download.
I have Verified with High level Api and Low Level Api of aws cli by enabling debug mode but I can't see any where it doing checksum validation.
High Level Api : aws s3 cp /path/to/file s3://bucket/object-key Note: s3 is high level api of aws-cli
Low Level Api : aws s3api get-object --bucket bucket-name --key object-key /path/to/file Note: s3api is low level api of aws-cli