swift-nio-http2
swift-nio-http2 copied to clipboard
Use const array for unrolled Huffman table
Motivation
We've spent a long time with an unrolled Huffman-table to optimize the Huffman decoding of HPACK headers. This works well, but because of some limitations in Swift it's been impossible to embed this structure in the binary directly: we've always had to alloc-and-copy on startup. Additionally, due to type checker performance limitations, we've been Base64 decoding it on startup, which isn't great either.
The introduction of const allows us to embed the knowledge of the fact that this structure is compile-time constant in to the binary, improving performance.
Modifications
Revert to the old layout, constify, and remove anything that blocks that.
Result
Hopefully better performance and less dirty memory.
This is a bit of a WIP. In particular, we have a few issues to resolve. The biggest one is that this still drastically slows down compiles due to the type checker overhead. The type checking here takes 25s on my machine to check this one statement, which is simply not practical for something shipped as source code.
@swift-nio-bot test perf please
performance report
build id: 15
timestamp: Wed Sep 29 12:55:11 UTC 2021
results
name | min | max | mean | std |
1_conn_10k_reqs | 1.6213979721069336 | 1.6442910432815552 | 1.632042407989502 | 0.006699844032143084 |
encode_100k_header_blocks_indexable | 0.14471304416656494 | 0.14747297763824463 | 0.14572598934173583 | 0.0008046368602636831 |
encode_100k_header_blocks_nonindexable | 0.3090909719467163 | 0.3160330057144165 | 0.3103432893753052 | 0.00211925033167702 |
encode_100k_header_blocks_neverIndexed | 0.31516897678375244 | 0.31856203079223633 | 0.31606420278549197 | 0.0011066502001272586 |
decode_100k_header_blocks_indexable | 0.11055600643157959 | 0.11164200305938721 | 0.11097080707550049 | 0.0003699525350451268 |
decode_100k_header_blocks_nonindexable | 0.13858795166015625 | 0.13942408561706543 | 0.13902969360351564 | 0.0002693567257094209 |
decode_100k_header_blocks_neverIndexed | 0.13593900203704834 | 0.13771402835845947 | 0.13646239042282104 | 0.00048347271333613114 |
hpackheaders_canonical_form | 0.031477928161621094 | 0.032000064849853516 | 0.03155630826950073 | 0.00015699760597511416 |
hpackheaders_canonical_form_trimming_whitespace | 0.04422605037689209 | 0.044754981994628906 | 0.04440640211105347 | 0.00023535089672761531 |
hpackheaders_canonical_form_trimming_whitespace_short_strings | 0.04088902473449707 | 0.04139399528503418 | 0.04100780487060547 | 0.00019896661103882608 |
hpackheaders_canonical_form_trimming_whitespace_long_strings | 0.04970097541809082 | 0.05019199848175049 | 0.04985220432281494 | 0.00021980457652302524 |
huffman_encode_basic | 0.7676070928573608 | 0.7695299386978149 | 0.7681055784225463 | 0.0005364138489963227 |
huffman_encode_complex | 0.5194940567016602 | 0.5202770233154297 | 0.5198963046073913 | 0.0002579057159458666 |
huffman_decode_basic | 0.056532979011535645 | 0.05703103542327881 | 0.056725692749023435 | 0.0002025397769603034 |
huffman_decode_complex | 0.04661297798156738 | 0.04776501655578613 | 0.046978187561035153 | 0.0004299907412052386 |
server_only_10k_requests_1_concurrent | 0.3618440628051758 | 0.36552298069000244 | 0.3626688003540039 | 0.001039615633965984 |
server_only_10k_requests_100_concurrent | 0.2846330404281616 | 0.28559303283691406 | 0.2850819110870361 | 0.00035980189089376255 |
stream_teardown_10k_requests_100_concurrent | 0.15219998359680176 | 0.15268898010253906 | 0.152490496635437 | 0.00019064078063663374 |
comparison
Huh, no comparison there. Weird. I don't expect substantial performance improvements regardless: this only cleans up the dirty memory situation and our benchmark doesn't really test memory pressure.