envoy
envoy copied to clipboard
http: optimize header map implementation for improved performance
Description:
This PR introduces an optimized implementation of the HeaderMap interface that provides better performance for common header operations. It uses a combination of vector and hash map for header storage and lookup. This implementation shows significant performance improvements:
- 1.7x faster header insertions
- 3.3x faster header lookups
- Up to 3.6x faster header iteration
This optimization will benefit all Envoy deployments that handle significant HTTP traffic, particularly those with large numbers of headers or high request rates. The improvements in lookup and iteration performance will reduce CPU usage in header processing code paths.
Benchmark
Benchmarks are performed on both MacOS and Linux using the checked in benchmarking suite.
bazel run -c opt //test/benchmark:header_map_benchmark_test
Benchmark Results
Linux
Run on (32 X 3499.53 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 1280 KiB (x16)
L3 Unified 55296 KiB (x1)
Load Average: 10.41, 17.06, 16.77
------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------
BM_HeaderMapImplInsert/10/10 793 ns 793 ns 881801 items_per_second=12.6078M/s
BM_HeaderMapImplInsert/50/20 4327 ns 4327 ns 162107 items_per_second=11.556M/s
BM_HeaderMapImplInsert/100/50 9559 ns 9559 ns 73359 items_per_second=10.4614M/s
BM_HeaderMapOptimizedInsert/10/10 678 ns 678 ns 1037087 items_per_second=14.7545M/s
BM_HeaderMapOptimizedInsert/50/20 3528 ns 3528 ns 194380 items_per_second=14.1736M/s
BM_HeaderMapOptimizedInsert/100/50 8098 ns 8098 ns 85731 items_per_second=12.3495M/s
BM_HeaderMapImplLookup/10/10 355 ns 355 ns 1973543 items_per_second=28.194M/s
BM_HeaderMapImplLookup/50/20 1909 ns 1909 ns 366985 items_per_second=26.1928M/s
BM_HeaderMapImplLookup/100/50 3914 ns 3914 ns 178833 items_per_second=25.5496M/s
BM_HeaderMapOptimizedLookup/10/10 136 ns 136 ns 5100593 items_per_second=73.6141M/s
BM_HeaderMapOptimizedLookup/50/20 719 ns 719 ns 939043 items_per_second=69.5108M/s
BM_HeaderMapOptimizedLookup/100/50 1423 ns 1423 ns 494690 items_per_second=70.2956M/s
BM_HeaderMapImplIteration/10/10 24.2 ns 24.2 ns 29162575 items_per_second=413.807M/s
BM_HeaderMapImplIteration/50/20 87.6 ns 87.6 ns 7993415 items_per_second=570.628M/s
BM_HeaderMapImplIteration/100/50 160 ns 160 ns 4395914 items_per_second=623.138M/s
BM_HeaderMapOptimizedIteration/10/10 5.76 ns 5.76 ns 122007983 items_per_second=1.73746G/s
BM_HeaderMapOptimizedIteration/50/20 28.3 ns 28.3 ns 24778009 items_per_second=1.76795G/s
BM_HeaderMapOptimizedIteration/100/50 56.2 ns 56.2 ns 12446980 items_per_second=1.77809G/s
NOTE: Lower is better.
MacOS
Run on (12 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x12)
Load Average: 9.71, 5.99, 5.14
-------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------
BM_HeaderMapImplInsert/10/10 912 ns 911 ns 814939 items_per_second=10.972M/s
BM_HeaderMapImplInsert/50/20 4134 ns 4131 ns 170277 items_per_second=12.1026M/s
BM_HeaderMapImplInsert/100/50 8022 ns 8022 ns 86023 items_per_second=12.4663M/s
BM_HeaderMapOptimizedInsert/10/10 773 ns 772 ns 871058 items_per_second=12.9542M/s
BM_HeaderMapOptimizedInsert/50/20 3897 ns 3885 ns 179753 items_per_second=12.8703M/s
BM_HeaderMapOptimizedInsert/100/50 7903 ns 7875 ns 88386 items_per_second=12.6982M/s
BM_HeaderMapImplLookup/10/10 255 ns 254 ns 2740616 items_per_second=39.3018M/s
BM_HeaderMapImplLookup/50/20 1330 ns 1330 ns 508880 items_per_second=37.5988M/s
BM_HeaderMapImplLookup/100/50 2814 ns 2804 ns 258566 items_per_second=35.6634M/s
BM_HeaderMapOptimizedLookup/10/10 59.9 ns 59.8 ns 11774601 items_per_second=167.204M/s
BM_HeaderMapOptimizedLookup/50/20 335 ns 335 ns 2113450 items_per_second=149.346M/s
BM_HeaderMapOptimizedLookup/100/50 627 ns 627 ns 1115805 items_per_second=159.581M/s
BM_HeaderMapImplIteration/10/10 15.2 ns 15.2 ns 45153425 items_per_second=656.05M/s
BM_HeaderMapImplIteration/50/20 69.3 ns 69.3 ns 10096785 items_per_second=721.657M/s
BM_HeaderMapImplIteration/100/50 134 ns 131 ns 5432463 items_per_second=763.193M/s
BM_HeaderMapOptimizedIteration/10/10 3.63 ns 3.63 ns 189270013 items_per_second=2.75376G/s
BM_HeaderMapOptimizedIteration/50/20 15.8 ns 15.8 ns 44432596 items_per_second=3.17173G/s
BM_HeaderMapOptimizedIteration/100/50 31.2 ns 31.2 ns 22541597 items_per_second=3.20592G/s
BM_HeaderMapOptimizedCoreHeaders/100 231985 ns 231927 ns 2950 items_per_second=4.31169M/s
BM_HeaderMapOptimizedCoreHeaders/1000 2309916 ns 2307211 ns 299 items_per_second=4.33424M/s
BM_HeaderMapOptimizedCoreHeaders/10000 23187837 ns 23132600 ns 30 items_per_second=4.3229M/s
BM_HeaderMapImplRequestProcessing 1531 ns 1530 ns 450793
BM_HeaderMapOptimizedRequestProcessing 1205 ns 1201 ns 586039
NOTE: Lower is better.
Commit Message: http: optimize header map implementation for improved performance Risk Level: Low Testing: Unit tests and benchmarks added Docs Changes: N/A Release Notes: Added Platform Specific: No
As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
I'd like to see a performance comparison that mimics real-world access patterns. Specifically, compare lots of use of core headers like :path and :authority and the other headers that envoy must access for routing/processing. In the current implementation there are two classes of headers, and all of the core-protocol headers are in the "fast" path. So I'd like to see comparison of that.
Also, I'd like to see results in a full Envoy end-to-end test to see how much of a difference this change makes in overall performance.
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!