http
http copied to clipboard
Header name to lower case conversion and validation performance optimization
This PR includes two commits:
- Improves integer conversion performance in
HeaderValue: it eliminates an extra heap allocation by usingitoa's stack-allocated buffer instead. - Adds
WordRegisterfor efficient word-sized byte operations: this introduces chunked processing and validation for lowercase conversion and header name validation. It uses several tricks to reduce instruction count and enable batch processing, validating an entire chunk with just 3 assembly instructions instead of processing byte-by-byte with branching in every loop iteration.
Please review the unsafe parts again, and it would be great to test this on a big-endian CPU as well if one is available.
I wrote a benchmark for these changes: https://github.com/fereidani/headernamebench
It’s debatable whether this change actually benefits 32-bit systems; we can disable it by checking the pointer size constant if needed, which skips compilation of optimization for those targets.
Here are my results for this benchmark, showing roughly 50% performance improvement on typical workloads and only a negligible slowdown for very small headers (like Host) when the optimization does not apply:
header_to_lower_vs_optimized/header_to_lower_valid
time: [1.3416 µs 1.3446 µs 1.3483 µs]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
header_to_lower_vs_optimized/header_to_lower_optimized_valid
time: [717.29 ns 718.05 ns 718.81 ns]
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low severe
5 (5.00%) low mild
1 (1.00%) high mild
5 (5.00%) high severe
header_to_lower_vs_optimized/header_to_lower_invalid
time: [575.18 ns 579.05 ns 584.14 ns]
header_to_lower_vs_optimized/header_to_lower_optimized_invalid
time: [254.98 ns 255.65 ns 256.38 ns]
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
header_to_lower_vs_optimized/header_to_lower_host
time: [28.722 ns 28.789 ns 28.856 ns]
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
6 (6.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
header_to_lower_vs_optimized/header_to_lower_optimized_host
time: [29.522 ns 29.600 ns 29.672 ns]