`bytes` capture doesn't provide access to raw gzipped body
What is the current bug behavior?
Here's what the docs say regarding bytes capture:
Capture the entire body (as a raw bytestream) from the received HTTP response
When the response body comes in compressed, my expectation was to receive a raw bytestream (gzipped content). Unfortunately, the bytes capture always stores uncompressed data (see examples below).
Steps to reproduce
# ------------------------------------------------------------------------------
# '--compressed' flag enabled
# ------------------------------------------------------------------------------
GET https://httpbingo.org/gzip
[Options]
compressed: true
HTTP 200
[Captures]
gzipped_content_length1: header "Content-Length" toInt
gzipped_bytes_count1: bytes count
[Asserts]
variable "gzipped_content_length1" == {{gzipped_bytes_count1}}
# ------------------------------------------------------------------------------
# Turn '--compressed' off; explicitly set 'Accept-Encoding: gzip'
# ------------------------------------------------------------------------------
GET https://httpbingo.org/gzip
Accept-Encoding: gzip
[Options]
compressed: false
HTTP 200
[Captures]
gzipped_content_length2: header "Content-Length" toInt
gzipped_bytes_count2: bytes count
[Asserts]
variable "gzipped_content_length2" == {{gzipped_bytes_count2}}
And here's the output:
$ hurl --test --continue-on-error --verbose .
...
* ------------------------------------------------------------------------------
* Executing entry 1
...
* Response: (received 306 bytes in 501 ms)
*
< HTTP/2 200
< access-control-allow-credentials: true
< access-control-allow-origin: *
< content-encoding: gzip
< content-length: 306
...
* Captures:
* gzipped_content_length1: 306
* gzipped_bytes_count1: 612
*
error: Assert failure
--> ./gzip.hurl:16:0
|
| GET https://httpbingo.org/gzip
| ...
16 | variable "gzipped_content_length1" == {{gzipped_bytes_count1}}
| actual: integer <306>
| expected: integer <612>
|
* ------------------------------------------------------------------------------
* Executing entry 2
...
* Response: (received 294 bytes in 128 ms)
*
< HTTP/2 200
< access-control-allow-credentials: true
< access-control-allow-origin: *
< content-encoding: gzip
< content-length: 294
...
* Captures:
* gzipped_content_length2: 294
* gzipped_bytes_count2: 599
*
error: Assert failure
--> ./gzip.hurl:34:0
|
| GET https://httpbingo.org/gzip
| ...
34 | variable "gzipped_content_length2" == {{gzipped_bytes_count2}}
| actual: integer <294>
| expected: integer <599>
|
My actual use case involves response size comparison between a request with Accept-Encoding: gzip header and without it (to make sure responses get compressed server-side). Content-Length may sound like a viable alternative, however, that's not a super reliable option (it's sometimes missing, which is compliant with RFC)
What is the expected correct behavior?
The referenced endpoint always returns gzipped data, so Accept-Encoding request header doesn't make any difference:
$ curl https://httpbingo.org/gzip -s -o >(file -)
/dev/stdin: gzip compressed data
I'd expect that bytes count and Content-Length header values to be equal (regardless of compression).
The output even says (received X bytes in Y ms), but the X value doesn't match what bytes count stores.
It'd be great to have access to truly "raw" data (gzipped body in this case).
Execution context
$ hurl --version
hurl 6.1.1 (x86_64-apple-darwin24.0) libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.64.0
Features (libcurl): alt-svc AsynchDNS HSTS HTTP2 IPv6 Largefile libz NTLM SPNEGO SSL UnixSockets
Features (built-in): brotli
Possible fixes
If it's doable, then a new capture (let's say raw_bytes) might be the right direction (for backwards compatibility).
Hi @jwadolowski I agree it's a bit unintuitive but in the docs for asserts => https://hurl.dev/docs/asserting-response.html
Body responses can be encoded by server (see Content-Encoding HTTP header) but asserts in Hurl files are not affected by this content compression. All body asserts (body, bytes, sha256 etc...) work after content decoding.
And https://hurl.dev/docs/asserting-response.html#bytes-assert
Like body assert, bytes assert works after content encoding decompression (so the predicates values are not affected by Content-Encoding response header value).
Where you're completely right is that we've not explained it and emphasis in captures, while we've done this in asserts
I'm going to update the docs on captures to be aligned with asserts (i.e bytes work after content decoding).
I dont want to change the behavior of bytes since it's documented and assumed (at least in asserts). But the access to the raw bytes is totaly legitimate so we can imagine a query that will be raw or rawbytes : this will be the only query that don't work after content decoding and give you full access to the raw HTTP response bytes.
@fabricereix @lepapareil OK for this ?