aws-sdk-cpp
aws-sdk-cpp copied to clipboard
The SDK doesn't use gzip compression
Describe the bug
Despite the claims in documentation aws sdk doesn't perform gzip compression under any circumstances
Expected Behavior
So, it is expected that sdk will compress request data according to client configuration. However, I want to also clarify that this configuration lacks of list of allowed content types for compression - as we don't want to compress files like jpg or mp4
Current Behavior
Currently if you try to upload file with TransferManager or S3 client - no compression will happen. You also can try to compress the file manually and after that upload it with TransferManager or S3 client, but it will not work properly. First of all, there are no options in TransferManager to provide content-encoding, so it will not work obviously. Second of all, S3 client also will not work properly, because signer will rewrite content-encoding, unless you changed checksum algorithm to NOT_SET and use md5 instead.
Reproduction Steps
Just use default config for TransferManager and try to upload some big json file (bigger then 10 Kb!) and check GetBytesTransfered() for that operation - it will be equal (or a bit bigger) to size of the file. So no compression applied
Possible Solution
So, first of all we should enable the compression - the line returns false always.
But even if we change it explicitly to GZIP - it still will not work, because during the compression it still left previous content-length and doesn't set decoded content-length, that is required for backend service for uncompression. So, lets add it (and remove content-length - proper one will be set later by AddContentBodyToRequest).
But even after that it will not work, as for calculate checksum it still uses the old body, not compressed one. So, let change that too and use actual body (from httpRequest)
But it still will not work, because when we calculate the hash we expect that stream will be good, what can be not the case, especially after some reading and moving cursor back-and-forth. So, we should explicitly clear the stream, before start reading.
And only now, if you explicitly set checksum algorithm as NOT_SET and enable computing md5, compression will finally work
Additional Information/Context
No response
AWS CPP SDK version used
1.11.398
Compiler and Version used
gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Operating System and version
Ubuntu 18.04.6 LTS
S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage. AFAIK Cloudwatch is the only service the SDK currently supports compression for.
With that said, overriding the content encoding when aws-chunked is used seems like a bug and it should be appending instead.
S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage
hm, are you sure? We actually upload gzip files in that way and we have unpacked files on s3
sorry, yeah, you are correct - the s3 doesn't decompress data. It just handles it in a nice way, so from user perspective difference is not visible, but yeah, data stored compressed
@andrejlevkovitch did you have any other questions about compression and this sdk?
did you have any other questions about compression and this sdk?
nope, except the problems related to SDK I already described:
- rewriting of Content-Encoding
- gzip compressing doesn't work correctly if requested (Content-Length not changed after compression, incorrect checksum calculation after compression)
- small bug in hash calculation (clearing of the stream is needed before using it)
- compression params are never used, despite they are present in the api. Also that params are not flexible - they can be set for all files types only. IMHO, better to set it per Content-Type (so you do not need to compress video, but for json file it will be useful)
Are you still expecting the sdk/s3 to compress your data? Your most recent questions seem to indicate that you think the sdk should be doing some type of compression but this is not done. None of the compression params will be used as @DmitriyMusatkin mentioned.
@jmklix
- there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)
- there are RequestCompressionConfig in ClientConfiguration that is enabled by default and never used - that is confusing at least
there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)
you are absolutely right, put out a PR for the fix
fix should be merged now, lemme know if that fixes it for you, it should.
as for the second part
For services that support it, request content will be compressed. On by default if dependency available
is the description of the the compression cmake arg, what can be updated to help clear up potential confusion? we call out "For services that support it" which is more or less what is called out in our public docs. What specifically in that documentation could be updated to avoid future confusion?
well, it is mostly about usage of the configuration: the config for s3 client contains field that is apriory not supported - that is already confusing, because I would expect that if the field is present, then it is possible to use it. Anyway I'm not sure what to suggest about that
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
because I would expect that if the field is present, then it is possible to use it.
Very valid and something that we likely could introduce in a future version. In the initial implementation likely should have been "if the service does not accept it, dont code generate it". just in its current state we cant remove it because we guarantee (or at least try our hardest) not to break compilation for anyone by removing stuff.
In a future breaking change though, making its presence based on ability is a good idea though.