[FEATURE] Disable automatic response decompression
Describe your feature request
As a template writer, I would like to disable the automatic response decompression on a per-template basis.
Describe the use case of the feature
Currently all content that is gzip of otherwise compressed, is automatically decompressed before returning it to the template matchers. As far as I can tell this happens regardless of the presence of a relevant Content-Encoding: gzip header This makes it impossible to do raw byte or regex extraction on the gzip header itself.
This feature would be used to write a Nuclei template version for the version extractor for Citrix Netscaler, as done in https://github.com/fox-it/citrix-netscaler-triage/blob/main/scan-citrix-netscaler-version.py.
I can think of two solutions:
- An additional template config option to disable all automatic response body decompression
- Disable the decompression when the request was made with the HTTP Request header
Accept-Encoding: identity.
Describe alternatives you've considered
- Write a network based template
Additional context
No response
@darses Hi! I'm interested in this issue. Would it be okay if I take over this issue and start working on it?
@darses Hi! I'm interested in this issue. Would it be okay if I take over this issue and start working on it?
Go ahead, I am not actively working on but. The request stems from the way I want to implement a version check, which is done by looking at the Last-Modified date from a specific file inside of a gzipped file. Since Nuclei automatically decompiled the gzip content, the Last-Modified date information is lost.
Hi @darses,
I just wanted to confirm my current understanding and get your thoughts on the right direction moving forward.
As I understand it, the issue is that when a server returns gzip-compressed content, Nuclei automatically decompresses it before exposing it to the template matchers. This behavior prevents access to gzip-level metadata like the mtime field, which is necessary for version detection (e.g., in the Citrix Netscaler use case).
One proposed workaround is to use the Accept-Encoding: identity request header. However, this simply prevents the server from compressing the response in the first place — meaning no gzip stream is returned at all. So this method is unsuitable when we want the raw gzip response intact.
Given this limitation, I explored adding a template-level option like disable-compression: true, which would instruct the Nuclei engine to not automatically decompress even if the server responds with gzip content. However, based on feedback from the PD team (via Discord), it seems that introducing a new field may not be preferred.
But also making a network template is a hard job.
How could we do next step?
cc: @ehsandeep
Thanks again!
Without new Nuclei CLI flags of template fields, I think the best option is to disable automatic decompression based on the HTTP Accept-Encoding and/or Content-Encoding headers.
The tricky part is that servers may not respond to the Accept-Encoding header, so you may need to rely on the request header to determine the intent of the template writer. Essentially you check if the request contains Accept-Encoding: identity and if that is the case, disable automatic decompression regardless of any Content-Encoding response header.
Alternatively implement this on both Accept-Encoding: identity (request) and Content-Encoding: identity (response). (Inclusive or).
Not sure if that is an acceptable solution for the PD team, nor if the request headers are available when deciding on automatic decompression (or not).
Edit: I do not think the concern for decompressie contant is justified, at least for netscaler. The server is serving a .gz static file, not automatically providing compressed content. The response with encoding identity is a gzipped response.
Thanks for the discussion.
My tests confirm that the raw, gzipped response is already available in the DSL body variable. This is great news—it means the core decompression logic probably doesn't need to be changed.
The remaining challenge is that the DSL can't parse this binary data to extract the mtime.
Proposal: A New Helper Function
So, instead of adding new fields or flags, I'd like to propose a simple gzip_mtime(body) helper function for the DSL.
This would allow for a clean template: ex)
# ...
http:
- method: GET
path:
- "/path/to/rdx_en.js.gz"
matchers:
- type: dsl
dsl:
- "gzip_mtime(body) == 1543395386" # Example timestamp
This approach seems non-disruptive, explicit, and directly solves the problem by providing the right tool to analyze the data that's already there.
What do you think of this direction?
@khs-alt thanks for exploring this further. Using a DSL helper for this would be ideal, as it doesn’t require a template-level change.
@khs-alt thanks for exploring this further. Using a DSL helper for this would be ideal, as it doesn’t require a template-level change.
Great! I'll start working on implementing the gzip_mtime(body) helper function
Hi, @darses, I've developed a DLS helper function. I hope it works according to your intention. If any problem, feel free to tell me. Thank you.
Hi, @darses, I've developed a DLS helper function. I hope it works according to your intention. If any problem, feel free to tell me. Thank you.
Thanks! I will check it out sometime this week and let you know.
Unfortunately, this still not works for me with a locally compiled DSL version that includes gzip_mtime.
My template in short:
http:
- method: GET
path:
- "{{BaseURL}}/vpn/js/rdx/core/lang/rdx_en.json.gz"
matchers:
- type: dsl
dsl:
- status_code==200 && contains(body, "{}")
extractors:
- type: dsl
name: test2
dsl:
- to_string(gzip_mtime(body))
There are no DSL errors, which do appear then I use gzip_mtime2(body). So I am running the master branch of dsl that includes this new helper function. Still the (extracted) output is empty. When I add && to_string(gzip_mtime(body)) > 0 or && gzip_mtime(body) > 0 to the matcher I no longer get a match.
EDIT: I think the issue is that body does not contain the raw bytes from the server, but is already decompressed. When I input the raw bytes as hex string and use to_string(gzip_mtime(hex_decode("bytes")), then extraction works fine.
EDIT 2: I tried to recreate the provided test-content.txt.gz example case. I tested with python -m http.server. In the HTTP response I notice Content-Type: application/gzip without Content-Encoding response header. In my real world example both Content-Type: application/json; charset=utf-8 and Content-Encoding: gzip are set. In the Nuclei debug I notice that the Python example does return raw bytes in the response body. In my real world example only the unzipped text is returned. It seems to me that the automatic decompression happens based on either Content-Type or Content-Encoding and that the provided example does not correctly produce the GZipped response that I am working with.