aws-sdk-cpp The SDK doesn't use gzip compression

Describe the bug

Despite the claims in documentation aws sdk doesn't perform gzip compression under any circumstances

Expected Behavior

So, it is expected that sdk will compress request data according to client configuration. However, I want to also clarify that this configuration lacks of list of allowed content types for compression - as we don't want to compress files like jpg or mp4

Current Behavior

Currently if you try to upload file with TransferManager or S3 client - no compression will happen. You also can try to compress the file manually and after that upload it with TransferManager or S3 client, but it will not work properly. First of all, there are no options in TransferManager to provide content-encoding, so it will not work obviously. Second of all, S3 client also will not work properly, because signer will rewrite content-encoding, unless you changed checksum algorithm to NOT_SET and use md5 instead.

Reproduction Steps

Just use default config for TransferManager and try to upload some big json file (bigger then 10 Kb!) and check GetBytesTransfered() for that operation - it will be equal (or a bit bigger) to size of the file. So no compression applied

Possible Solution

So, first of all we should enable the compression - the line returns false always.

But even if we change it explicitly to GZIP - it still will not work, because during the compression it still left previous content-length and doesn't set decoded content-length, that is required for backend service for uncompression. So, lets add it (and remove content-length - proper one will be set later by AddContentBodyToRequest).

But even after that it will not work, as for calculate checksum it still uses the old body, not compressed one. So, let change that too and use actual body (from httpRequest)

But it still will not work, because when we calculate the hash we expect that stream will be good, what can be not the case, especially after some reading and moving cursor back-and-forth. So, we should explicitly clear the stream, before start reading.

And only now, if you explicitly set checksum algorithm as NOT_SET and enable computing md5, compression will finally work

Additional Information/Context

No response

AWS CPP SDK version used

1.11.398

Compiler and Version used

gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

Operating System and version

Ubuntu 18.04.6 LTS

Sep 10 '24 14:09 andrejlevkovitch

S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage. AFAIK Cloudwatch is the only service the SDK currently supports compression for.

With that said, overriding the content encoding when aws-chunked is used seems like a bug and it should be appending instead.

Sep 10 '24 16:09 DmitriyMusatkin

S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage

hm, are you sure? We actually upload gzip files in that way and we have unpacked files on s3

Sep 11 '24 08:09 andrejlevkovitch

sorry, yeah, you are correct - the s3 doesn't decompress data. It just handles it in a nice way, so from user perspective difference is not visible, but yeah, data stored compressed

Sep 11 '24 16:09 andrejlevkovitch

@andrejlevkovitch did you have any other questions about compression and this sdk?

Oct 04 '24 19:10 jmklix

did you have any other questions about compression and this sdk?

nope, except the problems related to SDK I already described:

rewriting of Content-Encoding
gzip compressing doesn't work correctly if requested (Content-Length not changed after compression, incorrect checksum calculation after compression)
small bug in hash calculation (clearing of the stream is needed before using it)
compression params are never used, despite they are present in the api. Also that params are not flexible - they can be set for all files types only. IMHO, better to set it per Content-Type (so you do not need to compress video, but for json file it will be useful)

Oct 05 '24 10:10 andrejlevkovitch

Are you still expecting the sdk/s3 to compress your data? Your most recent questions seem to indicate that you think the sdk should be doing some type of compression but this is not done. None of the compression params will be used as @DmitriyMusatkin mentioned.

Nov 12 '24 18:11 jmklix

@jmklix

there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)
there are RequestCompressionConfig in ClientConfiguration that is enabled by default and never used - that is confusing at least

Nov 14 '24 11:11 andrejlevkovitch

there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)

you are absolutely right, put out a PR for the fix

Nov 22 '24 16:11 sbiscigl

fix should be merged now, lemme know if that fixes it for you, it should.

as for the second part

For services that support it, request content will be compressed. On by default if dependency available

is the description of the the compression cmake arg, what can be updated to help clear up potential confusion? we call out "For services that support it" which is more or less what is called out in our public docs. What specifically in that documentation could be updated to avoid future confusion?

Nov 25 '24 14:11 sbiscigl

well, it is mostly about usage of the configuration: the config for s3 client contains field that is apriory not supported - that is already confusing, because I would expect that if the field is present, then it is possible to use it. Anyway I'm not sure what to suggest about that

Nov 25 '24 15:11 andrejlevkovitch

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Nov 25 '24 15:11 github-actions[bot]

because I would expect that if the field is present, then it is possible to use it.

Very valid and something that we likely could introduce in a future version. In the initial implementation likely should have been "if the service does not accept it, dont code generate it". just in its current state we cant remove it because we guarantee (or at least try our hardest) not to break compilation for anyone by removing stuff.

In a future breaking change though, making its presence based on ability is a good idea though.

Nov 25 '24 17:11 sbiscigl

aws-sdk-cpp aws-sdk-cpp copied to clipboard

The SDK doesn't use gzip compression

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

AWS CPP SDK version used

Compiler and Version used

Operating System and version

aws-sdk-cpp
aws-sdk-cpp copied to clipboard