boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Checksum `x-amz-checksum-crc32` seems to be added INSIDE my files

Open jgaucher-cs opened this issue 10 months ago • 11 comments

Describe the bug

When uploading a local file to my s3 storage, some kind of checksum seems to be added directory inside the contents of my file:

326d # added by boto3

# my file contents ...

0 # added by boto3
x-amz-checksum-crc32:6da4RA== # added by boto3

Regression Issue

  • [x] Select this option if this issue appears to be a regression.

It does not happen with boto3==1.35.41

Expected Behavior

The uploaded file contents should not be modified.

Current Behavior

The uploaded file contents are modified.

Reproduction Steps

Create a dummy file locally:

echo toto > /tmp/toto.txt

Install boto3:

pip install boto3==1.36.16

Init s3 client and upload file:

import boto3
s3_session = boto3.session.Session()
client = s3_session.client(
    service_name="s3",
    aws_access_key_id="my-access",
    aws_secret_access_key="my-secret",
    endpoint_url="https://my-url",
    region_name="my-region",
)
client.upload_file("/tmp/toto.txt", "my-bucket", "my-folder/toto.txt")

# Now read uploaded file contents:
print(client.get_object(Bucket="my-bucket", Key="my-folder/toto.txt")["Body"].read())
# returns: b'5\r\ntoto\n\r\n0\r\nx-amz-checksum-crc32:+H0IvQ==\r\n\r\n'

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.36.16

Environment details (OS name and version, etc.)

Linux Ubuntu 22

jgaucher-cs avatar Feb 10 '25 17:02 jgaucher-cs

It seems to be resolved if I add this at the top of my code:

import os
os.environ["AWS_REQUEST_CHECKSUM_CALCULATION"] = "when_required"
os.environ["AWS_RESPONSE_CHECKSUM_VALIDATION"] = "when_required"

jgaucher-cs avatar Feb 10 '25 17:02 jgaucher-cs

Hi @jgaucher-cs , thanks for reaching out. This change was recently announced by Python team, related to Announcement: S3 default integrity change -

In AWS SDK for Python v1.36.0, we released changes to the S3 client that adopts new default integrity protections. For more information on default integrity behavior, please refer to the official [SDK documentation](https://docs.aws.amazon.com/sdkref/latest/guide/feature-dataintegrity.html)

The workaround you suggested has also been mentioned to bypass the default checksum.

Hope that clarifies your questions. Please feel free to reach out if this does not help.

Thanks

khushail avatar Feb 10 '25 19:02 khushail

Hi @khushail, thank you for your response. The official SDK documentation says:

Amazon S3 independently calculates a checksum on the server side and validates it against the provided value before durably storing the object and its checksum in the object's metadata.

In my case, the checksum is not stored in the object's metadata, it's stored inside the file contents, making it corrupt and unreadable (e.g. if it's a Python script, it cannot be run anymore). Why is that ?

jgaucher-cs avatar Feb 11 '25 07:02 jgaucher-cs

I also have this problem. Have any ideas?

ZeniT21 avatar Feb 11 '25 14:02 ZeniT21

Hey @jgaucher-cs @ZeniT21,

I was not able to reproduce this issue when making requests to Amazon S3. When making the request to S3 using the example you provided, I receive the following body: b'toto\n'. Are you using a third-party S3 compatible service?

As mentioned above, you can prevent the default checksum calculation behavior using the when_required value for the AWS_REQUEST_CHECKSUM_CALCULATION environment variable or request_checksum_calculation config option as mentioned in the boto3 configuration guide.

jonathan343 avatar Feb 11 '25 15:02 jonathan343

Hey @jonathan343

Are you using a third-party S3 compatible service?

Yes indeed we are using S3 compatible service from other Cloud provider than AWS (Orange Flexible Engine and/or OVH). The issue might be they are not fully compliant.

nleconte-csgroup avatar Feb 11 '25 15:02 nleconte-csgroup

I used an older version and it worked fine.

ZeniT21 avatar Feb 11 '25 18:02 ZeniT21

Hi @jgaucher-cs , since you are using 3rd party services which might not be compatible and does not support aws chunked requests, there is a workaround as mentioned in Announcement shared earlier, which is what you are using here. So This should be workable as suggested.

khushail avatar Feb 11 '25 21:02 khushail

Note that this is also an issue when using moto for testing (and, I think based on comments elsewhere, Localstack as well) - the checksums are added in the file contents. And while setting those environment variables to "when_required" 'fixes' the issue, it also prevents boto from loading the checksum in the response at all.

KLarrabee-Arcadia avatar Apr 03 '25 14:04 KLarrabee-Arcadia

Hi there 👋 I've been trying to identify the problem for a few days and this issue just solved it:

  • in my case, I've been trying to use upload_fileobj, upload_files, and put_object for images into storage
  • I've had several problems recently, while I had none in projects from a few years back
  • I started a fresh one recently where the boto3 version wasn't pinned
  • Using >=1.36 gives me either octet_stream or corrupted data, and as mentioned above it adds a checksum in the file content. I inspected the bytes, found the checksum which let me to this. I've done my best but I cannot "recover" the file from that (even adding elaborate content type during uploading and exploring all the possible config options)
  • I just added <1.36 as a version constraint, all my problems just went away 😅

So this is definitely a problem starting from version 1.36 and I also had it in 1.37+. For now, I'll only downgrade but if there is a long term solution to prevent this issue, I'm all ears! In my opinion, the default shouldn't yield this behavior in 1.36 and higher. Let me know if I can help

frgfm avatar Apr 14 '25 10:04 frgfm

Hi! I abond, same as @frgfm . A code which has been working for years stopped working with latest boto3 versions. Fixing to 1.35 solved the issue. Is anyone working on it? For such an important package, I would have expected a solution found sooner :'(.

tepelbaum avatar May 22 '25 12:05 tepelbaum