unit
unit copied to clipboard
[WIP] HTTP Compression Support
First things first, as the title suggests, this is very much a 'Work In Progress'.
The code is in a mixture of Unit and Kernel coding style, full of debugging, subject to change and it's all in one big patch.
I'm posting this in its current form to show current progress and the overall approach taken.
This has gone through several iterations up to this point
V1
Initial version which worked similarly to @alejandro-colomar's version by
doing the compression from nxt_http_request_send()
This has some issues
-
We can't set the Content-Length, it's too late at this point, so compressed responses are sent chunked.
-
Creates a new buffer (via
malloc(3)
) to store the compressed data, then copies it over to the response buffer (overwriting what was there). -
We have an issue if we are compressing data which then takes up more space than we have in the response buffer, this would likely hit for small data sizes where the compressed data + meta data would be larger than the original data. This can be handled as @alejandro-colomar did by allocating a new buffer and swapping it with the current one.
-
Slightly convoluted logic to determine if we are on the last buffer, so the compressor can know to properly finish up.
This approach did have the advantage of also handling application responses.
V2
In this version the compression is done in nxt_http_static_buf_completion()
.
This has the advantage that we can allocate large enough output buffers in
nxt_http_static_body_handler()
to allow for the maximum compressed size of
the data.
It is also easier to tell if we are on the last buffer.
-
This still has issue (1) above.
-
This uses a static buffer of
NXT_HTTP_STATIC_BUF_SIZE
(128KiB) as a temporary buffer to read in chunks of the file to be compressed.
V3
This version uses mmap(2)
to map the file to be compressed into memory, we
can then read from this mapping directly without the need to read the file
into a temporary buffer.
- This still has issue (1) above.
V4 (this version)
Like V3 above we mmap(2)
the file to be compressed. However we do this directly
in nxt_http_static_send_ready()
where we we read the file to be compressed in.
We also mmap(2)
a temporary output file (created via mkstemp(3)
) initially sized
to the maximum compressed size of the file.
We then compress the input mmap'd file into the output mmap'd file, negating the need for any intermediate buffering, saving extraneous copies and stack space (or the overhead of heap allocations) for said buffer.
Finally, we ftruncate(2)
this new output file to its actual size.
This file is what is then used in nxt_http_static_body_handler()
to do the output
buffer allocations and so we don't need to make any modifications to that function.
This also allows us to correctly set the Content-Length
header.
Other than V1, these don't handle application responses which will need handling separately.
What works
This supports; deflate, gzip, zstd & brotli compression. This is all opt-in, e.g
$ ./configure ... --zlib --zstd --brotli
...
checking for getgrouplist() ... found
checking for zlib ... found
+ zlib version: 1.3.1.zlib-ng
checking for zstd ... found
+ zstd version: 1.5.6
checking for brotli ... found
+ brotli version: 1.1.0
checking for PCRE2 library ... found
...
TLS support: ............... NO
zlib support: .............. YES
zstd support: .............. YES
brotli support: ............ YES
Regex support: ............. YES
...
The compressors src/nxt_{zlib,zstd,brotli}.c
themselves are nicely isolated from the Unit core and are
just the bare minimum required to do the actual compression.
Configuration may look like
{
"listeners": {
"[::1]:8080": {
"pass": "routes"
}
},
"settings": {
"http": {
"static": {
"mime_types": {
"text/x-c": [
".c",
".h"
]
}
},
"compression": {
"types": [
"text/*"
],
"compressors": [
{
"encoding": "gzip",
"level": 3,
"min_length": 2048
},
{
"encoding": "deflate",
"min_length": 1024
},
{
"encoding": "zstd",
"min_length": 2048
},
{
"encoding": "br",
"min_length": 256
}
]
}
}
},
"routes": [
{
"match": {
"uri": "*"
},
"action": {
"share": "/srv/unit-share$uri"
}
}
]
}
This adds a new settings.http.compression
config option. Under here we can define the mime-types we want to compress and what compressors we want to use. For each compressor we can set the compression level and the minimum length of content to compress.
This could be extended for example to allow for per-compressor mime-type overrides.
Todo
As mentioned above this currently only handles compressing static share content. Compressing application responses needs to be handled separately.
While this tries to handle more complex Accept-Encoding
headers e.g gzip;q=1.0, identity;q=0.5, *;q=0
this no doubt requires a little more work.