Allow setting `no_gzip` / `gzip` for Request
Background
GCS has a feature called Decompressive Transcoding, which can decompress gzip-encoded files based on specific conditions.
For example:
The file is gzip-compressed when stored in Cloud Storage. The object's metadata includes
Content-Encoding: gzip.
Users need to use the following ways to avoid this:
There are two ways to prevent decompressive transcoding from occurring for an object that is otherwise eligible:
If the request for the object includes an
Accept-Encoding: gzipheader, the object is served as-is in that specific request, along with aContent-Encoding: gzipresponse header.If the Cache-Control metadata field for the object is set to no-transform, the object is served as a compressed object in all subsequent requests, regardless of any Accept-Encoding request headers.
But sadly, this feature can conflict with reqwest's auto gzip behavior. We will have following behavior matrix as described in https://github.com/apache/opendal/issues/5070
The only workaround so far is to set no_gzip at the client level, but this can introduce https://github.com/apache/opendal/issues/5897 because gcs will return gzip-ed response in other API.
Proposal
My current idea is to allow setting no_gzip / gzip for Request directly, so users like opendal can control whether disable reqwest's auto gzip behavior.
I have checked related logic and it seems easy to be added without introduce big changes:
https://github.com/seanmonstar/reqwest/blob/03d1635347cbfe979bd5a7f4ba7ad2cdc73ef68c/src/async_impl/client.rs#L2910-L2917
We can add Accepts in Request and merge with client's settings before construct Response.
What do you think?
If you manually set .header("accept-encoding", "identity") on that specific request, does it all work?
If you manually set
.header("accept-encoding", "identity")on that specific request, does it all work?
Hi, this isn't working as expected. In this case, GCS performs Decompressive Transcoding, which results in the following two behaviors.
too much data mean this request will return more data than the size of its object.
Really? That seems weird, indeed. I read the doc you linked, it says to transcoding will happen if you send accept-encoding: gzip, which reqwest sends if the gzip feature is enabled. However, if it sees an existing accept-encoding header, it won't set one. Sending accept-encoding: identity is a normal thing that many servers understand to mean don't encode the content.
I read the doc you linked, it says to transcoding will happen if you send
accept-encoding: gzip, which reqwest sends if thegzipfeature is enabled.
Hi, gcs's behavior is that, for objects with content-encoding: gzip:
- If request has
accept-encoding: gziporcache-control: no-transform, the content will be sent as-is (aka, in gzip). - If not (like has
accept-encoding: identityonly), gcs will transcode the content to decompressed.
Maybe it can be solved by ignoring sending the accept-encoding header at the request level? Similar to attohttpc Http Client which allows to skip decompression when building the request
https://github.com/sbstp/attohttpc/blob/8500cda02d5075736143c10434e2ca52190a07e3/src/request/builder.rs#L382
If possible, I can make a PR for this
Hi, @seanmonstar what do you think? I'm open to either options.
I wrote up #2641 that relates to this.
Somwhat on a related note. I wonder if reqwest sets accept-encoding automatically based on feature flags enabled during compilation or do we need to set it manually when creating a request?
Yes, if the header doesn't already exist: https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.gzip