envoy http: optimize header map implementation for improved performance

Description:

This PR introduces an optimized implementation of the HeaderMap interface that provides better performance for common header operations. It uses a combination of vector and hash map for header storage and lookup. This implementation shows significant performance improvements:

1.7x faster header insertions
3.3x faster header lookups
Up to 3.6x faster header iteration

This optimization will benefit all Envoy deployments that handle significant HTTP traffic, particularly those with large numbers of headers or high request rates. The improvements in lookup and iteration performance will reduce CPU usage in header processing code paths.

Benchmark

Benchmarks are performed on both MacOS and Linux using the checked in benchmarking suite.

bazel run -c opt //test/benchmark:header_map_benchmark_test

Benchmark Results

Linux

Run on (32 X 3499.53 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 1280 KiB (x16)
  L3 Unified 55296 KiB (x1)
Load Average: 10.41, 17.06, 16.77
------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------
BM_HeaderMapImplInsert/10/10                 793 ns          793 ns       881801 items_per_second=12.6078M/s
BM_HeaderMapImplInsert/50/20                4327 ns         4327 ns       162107 items_per_second=11.556M/s
BM_HeaderMapImplInsert/100/50               9559 ns         9559 ns        73359 items_per_second=10.4614M/s
BM_HeaderMapOptimizedInsert/10/10            678 ns          678 ns      1037087 items_per_second=14.7545M/s
BM_HeaderMapOptimizedInsert/50/20           3528 ns         3528 ns       194380 items_per_second=14.1736M/s
BM_HeaderMapOptimizedInsert/100/50          8098 ns         8098 ns        85731 items_per_second=12.3495M/s
BM_HeaderMapImplLookup/10/10                 355 ns          355 ns      1973543 items_per_second=28.194M/s
BM_HeaderMapImplLookup/50/20                1909 ns         1909 ns       366985 items_per_second=26.1928M/s
BM_HeaderMapImplLookup/100/50               3914 ns         3914 ns       178833 items_per_second=25.5496M/s
BM_HeaderMapOptimizedLookup/10/10            136 ns          136 ns      5100593 items_per_second=73.6141M/s
BM_HeaderMapOptimizedLookup/50/20            719 ns          719 ns       939043 items_per_second=69.5108M/s
BM_HeaderMapOptimizedLookup/100/50          1423 ns         1423 ns       494690 items_per_second=70.2956M/s
BM_HeaderMapImplIteration/10/10             24.2 ns         24.2 ns     29162575 items_per_second=413.807M/s
BM_HeaderMapImplIteration/50/20             87.6 ns         87.6 ns      7993415 items_per_second=570.628M/s
BM_HeaderMapImplIteration/100/50             160 ns          160 ns      4395914 items_per_second=623.138M/s
BM_HeaderMapOptimizedIteration/10/10        5.76 ns         5.76 ns    122007983 items_per_second=1.73746G/s
BM_HeaderMapOptimizedIteration/50/20        28.3 ns         28.3 ns     24778009 items_per_second=1.76795G/s
BM_HeaderMapOptimizedIteration/100/50       56.2 ns         56.2 ns     12446980 items_per_second=1.77809G/s

NOTE: Lower is better.

Benchmark Graph Request

MacOS

Run on (12 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x12)
Load Average: 9.71, 5.99, 5.14
-------------------------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------
BM_HeaderMapImplInsert/10/10                  912 ns          911 ns       814939 items_per_second=10.972M/s
BM_HeaderMapImplInsert/50/20                 4134 ns         4131 ns       170277 items_per_second=12.1026M/s
BM_HeaderMapImplInsert/100/50                8022 ns         8022 ns        86023 items_per_second=12.4663M/s
BM_HeaderMapOptimizedInsert/10/10             773 ns          772 ns       871058 items_per_second=12.9542M/s
BM_HeaderMapOptimizedInsert/50/20            3897 ns         3885 ns       179753 items_per_second=12.8703M/s
BM_HeaderMapOptimizedInsert/100/50           7903 ns         7875 ns        88386 items_per_second=12.6982M/s
BM_HeaderMapImplLookup/10/10                  255 ns          254 ns      2740616 items_per_second=39.3018M/s
BM_HeaderMapImplLookup/50/20                 1330 ns         1330 ns       508880 items_per_second=37.5988M/s
BM_HeaderMapImplLookup/100/50                2814 ns         2804 ns       258566 items_per_second=35.6634M/s
BM_HeaderMapOptimizedLookup/10/10            59.9 ns         59.8 ns     11774601 items_per_second=167.204M/s
BM_HeaderMapOptimizedLookup/50/20             335 ns          335 ns      2113450 items_per_second=149.346M/s
BM_HeaderMapOptimizedLookup/100/50            627 ns          627 ns      1115805 items_per_second=159.581M/s
BM_HeaderMapImplIteration/10/10              15.2 ns         15.2 ns     45153425 items_per_second=656.05M/s
BM_HeaderMapImplIteration/50/20              69.3 ns         69.3 ns     10096785 items_per_second=721.657M/s
BM_HeaderMapImplIteration/100/50              134 ns          131 ns      5432463 items_per_second=763.193M/s
BM_HeaderMapOptimizedIteration/10/10         3.63 ns         3.63 ns    189270013 items_per_second=2.75376G/s
BM_HeaderMapOptimizedIteration/50/20         15.8 ns         15.8 ns     44432596 items_per_second=3.17173G/s
BM_HeaderMapOptimizedIteration/100/50        31.2 ns         31.2 ns     22541597 items_per_second=3.20592G/s
BM_HeaderMapOptimizedCoreHeaders/100       231985 ns       231927 ns         2950 items_per_second=4.31169M/s
BM_HeaderMapOptimizedCoreHeaders/1000     2309916 ns      2307211 ns          299 items_per_second=4.33424M/s
BM_HeaderMapOptimizedCoreHeaders/10000   23187837 ns     23132600 ns           30 items_per_second=4.3229M/s
BM_HeaderMapImplRequestProcessing            1531 ns         1530 ns       450793
BM_HeaderMapOptimizedRequestProcessing       1205 ns         1201 ns       586039

NOTE: Lower is better.

Benchmark Graph Request (1)

Commit Message: http: optimize header map implementation for improved performance Risk Level: Low Testing: Unit tests and benchmarks added Docs Changes: N/A Release Notes: Added Platform Specific: No

Apr 28 '25 16:04 agrawroh

As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/39252 was opened by agrawroh.

see: more, trace.

Apr 28 '25 16:04 repokitteh-read-only[bot]

I'd like to see a performance comparison that mimics real-world access patterns. Specifically, compare lots of use of core headers like :path and :authority and the other headers that envoy must access for routing/processing. In the current implementation there are two classes of headers, and all of the core-protocol headers are in the "fast" path. So I'd like to see comparison of that.

Also, I'd like to see results in a full Envoy end-to-end test to see how much of a difference this change makes in overall performance.

Apr 28 '25 23:04 ggreenway

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Jun 14 '25 12:06 github-actions[bot]

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Jun 21 '25 12:06 github-actions[bot]

envoy envoy copied to clipboard

http: optimize header map implementation for improved performance

Description:

Benchmark

Benchmark Results

Linux

MacOS

envoy
envoy copied to clipboard