unit [WIP] HTTP Compression Support

[WIP] HTTP Compression Support

Open ac000 opened this issue 7 months ago • 9 comments

First things first, as the title suggests, this is very much a 'Work In Progress'.

The code is in a mixture of Unit and Kernel coding style, full of debugging, subject to change and it's all in one big patch.

I'm posting this in its current form to show current progress and the overall approach taken.

This has gone through several iterations up to this point

V1

Initial version which worked similarly to @alejandro-colomar's version by doing the compression from nxt_http_request_send()

This has some issues

We can't set the Content-Length, it's too late at this point, so compressed responses are sent chunked.
Creates a new buffer (via malloc(3)) to store the compressed data, then copies it over to the response buffer (overwriting what was there).
We have an issue if we are compressing data which then takes up more space than we have in the response buffer, this would likely hit for small data sizes where the compressed data + meta data would be larger than the original data. This can be handled as @alejandro-colomar did by allocating a new buffer and swapping it with the current one.
Slightly convoluted logic to determine if we are on the last buffer, so the compressor can know to properly finish up.

This approach did have the advantage of also handling application responses.

V2

In this version the compression is done in nxt_http_static_buf_completion().

This has the advantage that we can allocate large enough output buffers in nxt_http_static_body_handler() to allow for the maximum compressed size of the data.

It is also easier to tell if we are on the last buffer.

This still has issue (1) above.
This uses a static buffer of NXT_HTTP_STATIC_BUF_SIZE (128KiB) as a temporary buffer to read in chunks of the file to be compressed.

V3

This version uses mmap(2) to map the file to be compressed into memory, we can then read from this mapping directly without the need to read the file into a temporary buffer.

This still has issue (1) above.

V4 (this version)

Like V3 above we mmap(2) the file to be compressed. However we do this directly in nxt_http_static_send_ready() where we we read the file to be compressed in.

We also mmap(2) a temporary output file (created via mkstemp(3)) initially sized to the maximum compressed size of the file.

We then compress the input mmap'd file into the output mmap'd file, negating the need for any intermediate buffering, saving extraneous copies and stack space (or the overhead of heap allocations) for said buffer.

Finally, we ftruncate(2) this new output file to its actual size.

This file is what is then used in nxt_http_static_body_handler() to do the output buffer allocations and so we don't need to make any modifications to that function.

This also allows us to correctly set the Content-Length header.

Other than V1, these don't handle application responses which will need handling separately.

What works

This supports; deflate, gzip, zstd & brotli compression. This is all opt-in, e.g

$ ./configure ... --zlib --zstd --brotli
...
checking for getgrouplist() ... found
checking for zlib ... found
 + zlib version: 1.3.1.zlib-ng
checking for zstd ... found
 + zstd version: 1.5.6
checking for brotli ... found
 + brotli version: 1.1.0
checking for PCRE2 library ... found
...
  TLS support: ............... NO
  zlib support: .............. YES
  zstd support: .............. YES
  brotli support: ............ YES
  Regex support: ............. YES
...

The compressors src/nxt_{zlib,zstd,brotli}.c themselves are nicely isolated from the Unit core and are just the bare minimum required to do the actual compression.

Configuration may look like

{
    "listeners": {
        "[::1]:8080": {
            "pass": "routes"
        }
    },

    "settings": {
        "http": {
            "static": {
                "mime_types": {
                    "text/x-c": [
                        ".c",
                        ".h"
                    ]
                }
            },
            "compression": {
                "types": [
                        "text/*"
                ],
                "compressors": [
                    {
                        "encoding": "gzip",
                        "level": 3,
                        "min_length": 2048
                    },
                    {
                        "encoding": "deflate",
                        "min_length": 1024
                    },
                    {
                        "encoding": "zstd",
                        "min_length": 2048
                    },
                    {
                        "encoding": "br",
                        "min_length": 256
                    }
                ]
            }
        }
    },

    "routes": [
        {
            "match": {
                "uri": "*"
            },

            "action": {
                "share": "/srv/unit-share$uri"
            }
        }
    ]
}

This adds a new settings.http.compression config option. Under here we can define the mime-types we want to compress and what compressors we want to use. For each compressor we can set the compression level and the minimum length of content to compress.

This could be extended for example to allow for per-compressor mime-type overrides.

Todo

As mentioned above this currently only handles compressing static share content. Compressing application responses needs to be handled separately.

While this tries to handle more complex Accept-Encoding headers e.g gzip;q=1.0, identity;q=0.5, *;q=0 this no doubt requires a little more work.

Jul 23 '24 13:07 ac000

unit unit copied to clipboard

[WIP] HTTP Compression Support

V1

V2

V3

V4 (this version)

What works

Todo

unit
unit copied to clipboard