Checksum `x-amz-checksum-crc32` seems to be added INSIDE my files
Describe the bug
When uploading a local file to my s3 storage, some kind of checksum seems to be added directory inside the contents of my file:
326d # added by boto3
# my file contents ...
0 # added by boto3
x-amz-checksum-crc32:6da4RA== # added by boto3
Regression Issue
- [x] Select this option if this issue appears to be a regression.
It does not happen with boto3==1.35.41
Expected Behavior
The uploaded file contents should not be modified.
Current Behavior
The uploaded file contents are modified.
Reproduction Steps
Create a dummy file locally:
echo toto > /tmp/toto.txt
Install boto3:
pip install boto3==1.36.16
Init s3 client and upload file:
import boto3
s3_session = boto3.session.Session()
client = s3_session.client(
service_name="s3",
aws_access_key_id="my-access",
aws_secret_access_key="my-secret",
endpoint_url="https://my-url",
region_name="my-region",
)
client.upload_file("/tmp/toto.txt", "my-bucket", "my-folder/toto.txt")
# Now read uploaded file contents:
print(client.get_object(Bucket="my-bucket", Key="my-folder/toto.txt")["Body"].read())
# returns: b'5\r\ntoto\n\r\n0\r\nx-amz-checksum-crc32:+H0IvQ==\r\n\r\n'
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.36.16
Environment details (OS name and version, etc.)
Linux Ubuntu 22
It seems to be resolved if I add this at the top of my code:
import os
os.environ["AWS_REQUEST_CHECKSUM_CALCULATION"] = "when_required"
os.environ["AWS_RESPONSE_CHECKSUM_VALIDATION"] = "when_required"
Hi @jgaucher-cs , thanks for reaching out. This change was recently announced by Python team, related to Announcement: S3 default integrity change -
In AWS SDK for Python v1.36.0, we released changes to the S3 client that adopts new default integrity protections. For more information on default integrity behavior, please refer to the official [SDK documentation](https://docs.aws.amazon.com/sdkref/latest/guide/feature-dataintegrity.html)
The workaround you suggested has also been mentioned to bypass the default checksum.
Hope that clarifies your questions. Please feel free to reach out if this does not help.
Thanks
Hi @khushail, thank you for your response. The official SDK documentation says:
Amazon S3 independently calculates a checksum on the server side and validates it against the provided value before durably storing the object and its checksum in the object's metadata.
In my case, the checksum is not stored in the object's metadata, it's stored inside the file contents, making it corrupt and unreadable (e.g. if it's a Python script, it cannot be run anymore). Why is that ?
I also have this problem. Have any ideas?
Hey @jgaucher-cs @ZeniT21,
I was not able to reproduce this issue when making requests to Amazon S3. When making the request to S3 using the example you provided, I receive the following body: b'toto\n'. Are you using a third-party S3 compatible service?
As mentioned above, you can prevent the default checksum calculation behavior using the when_required value for the AWS_REQUEST_CHECKSUM_CALCULATION environment variable or request_checksum_calculation config option as mentioned in the boto3 configuration guide.
Hey @jonathan343
Are you using a third-party S3 compatible service?
Yes indeed we are using S3 compatible service from other Cloud provider than AWS (Orange Flexible Engine and/or OVH). The issue might be they are not fully compliant.
I used an older version and it worked fine.
Hi @jgaucher-cs , since you are using 3rd party services which might not be compatible and does not support aws chunked requests, there is a workaround as mentioned in Announcement shared earlier, which is what you are using here. So This should be workable as suggested.
Note that this is also an issue when using moto for testing (and, I think based on comments elsewhere, Localstack as well) - the checksums are added in the file contents. And while setting those environment variables to "when_required" 'fixes' the issue, it also prevents boto from loading the checksum in the response at all.
Hi there 👋 I've been trying to identify the problem for a few days and this issue just solved it:
- in my case, I've been trying to use
upload_fileobj,upload_files, andput_objectfor images into storage - I've had several problems recently, while I had none in projects from a few years back
- I started a fresh one recently where the boto3 version wasn't pinned
- Using
>=1.36gives me either octet_stream or corrupted data, and as mentioned above it adds a checksum in the file content. I inspected the bytes, found the checksum which let me to this. I've done my best but I cannot "recover" the file from that (even adding elaborate content type during uploading and exploring all the possible config options) - I just added
<1.36as a version constraint, all my problems just went away 😅
So this is definitely a problem starting from version 1.36 and I also had it in 1.37+. For now, I'll only downgrade but if there is a long term solution to prevent this issue, I'm all ears! In my opinion, the default shouldn't yield this behavior in 1.36 and higher. Let me know if I can help
Hi! I abond, same as @frgfm . A code which has been working for years stopped working with latest boto3 versions. Fixing to 1.35 solved the issue. Is anyone working on it? For such an important package, I would have expected a solution found sooner :'(.