swift-nio-http2 icon indicating copy to clipboard operation
swift-nio-http2 copied to clipboard

Use const array for unrolled Huffman table

Open Lukasa opened this issue 2 years ago • 3 comments

Motivation

We've spent a long time with an unrolled Huffman-table to optimize the Huffman decoding of HPACK headers. This works well, but because of some limitations in Swift it's been impossible to embed this structure in the binary directly: we've always had to alloc-and-copy on startup. Additionally, due to type checker performance limitations, we've been Base64 decoding it on startup, which isn't great either.

The introduction of const allows us to embed the knowledge of the fact that this structure is compile-time constant in to the binary, improving performance.

Modifications

Revert to the old layout, constify, and remove anything that blocks that.

Result

Hopefully better performance and less dirty memory.

This is a bit of a WIP. In particular, we have a few issues to resolve. The biggest one is that this still drastically slows down compiles due to the type checker overhead. The type checking here takes 25s on my machine to check this one statement, which is simply not practical for something shipped as source code.

Lukasa avatar Apr 25 '22 15:04 Lukasa

@swift-nio-bot test perf please

Lukasa avatar Apr 25 '22 16:04 Lukasa

performance report

build id: 15

timestamp: Wed Sep 29 12:55:11 UTC 2021

results

nameminmaxmeanstd
1_conn_10k_reqs 1.6213979721069336 1.6442910432815552 1.632042407989502 0.006699844032143084
encode_100k_header_blocks_indexable 0.14471304416656494 0.14747297763824463 0.14572598934173583 0.0008046368602636831
encode_100k_header_blocks_nonindexable 0.3090909719467163 0.3160330057144165 0.3103432893753052 0.00211925033167702
encode_100k_header_blocks_neverIndexed 0.31516897678375244 0.31856203079223633 0.31606420278549197 0.0011066502001272586
decode_100k_header_blocks_indexable 0.11055600643157959 0.11164200305938721 0.11097080707550049 0.0003699525350451268
decode_100k_header_blocks_nonindexable 0.13858795166015625 0.13942408561706543 0.13902969360351564 0.0002693567257094209
decode_100k_header_blocks_neverIndexed 0.13593900203704834 0.13771402835845947 0.13646239042282104 0.00048347271333613114
hpackheaders_canonical_form 0.031477928161621094 0.032000064849853516 0.03155630826950073 0.00015699760597511416
hpackheaders_canonical_form_trimming_whitespace 0.04422605037689209 0.044754981994628906 0.04440640211105347 0.00023535089672761531
hpackheaders_canonical_form_trimming_whitespace_short_strings 0.04088902473449707 0.04139399528503418 0.04100780487060547 0.00019896661103882608
hpackheaders_canonical_form_trimming_whitespace_long_strings 0.04970097541809082 0.05019199848175049 0.04985220432281494 0.00021980457652302524
huffman_encode_basic 0.7676070928573608 0.7695299386978149 0.7681055784225463 0.0005364138489963227
huffman_encode_complex 0.5194940567016602 0.5202770233154297 0.5198963046073913 0.0002579057159458666
huffman_decode_basic 0.056532979011535645 0.05703103542327881 0.056725692749023435 0.0002025397769603034
huffman_decode_complex 0.04661297798156738 0.04776501655578613 0.046978187561035153 0.0004299907412052386
server_only_10k_requests_1_concurrent 0.3618440628051758 0.36552298069000244 0.3626688003540039 0.001039615633965984
server_only_10k_requests_100_concurrent 0.2846330404281616 0.28559303283691406 0.2850819110870361 0.00035980189089376255
stream_teardown_10k_requests_100_concurrent 0.15219998359680176 0.15268898010253906 0.152490496635437 0.00019064078063663374

comparison

swift-server-bot avatar Apr 25 '22 16:04 swift-server-bot

Huh, no comparison there. Weird. I don't expect substantial performance improvements regardless: this only cleans up the dirty memory situation and our benchmark doesn't really test memory pressure.

Lukasa avatar Apr 25 '22 16:04 Lukasa