trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

H3: Make sure we remove streams when needed.

Open brbzull0 opened this issue 1 year ago • 0 comments

Streams weren’t deleted when needed. Stream list was increasing in size as items were never removed from it, generating a very large list and making finding items slow as we needed to walk over a large list. This fixes the logic to check if the stream is ready to be removed, if so, then the stream is correctly removed from the list.

The main issue was the cmp in is_readable. Beside this I made another change to not call find_stream twice when creating a stream(probably could be in a separate pr).

Notes

Tests done on a small nuc with:

Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz 32G mem 4 Cores 8T

Build in Release mode.

So, with the patch:

h2load -n 400000 -c 20 --alpn-list=h3 https://fedora:4443/cache/123000

Numbers from h2load are very crappy, but based on the hardware used, that's not the main point.

finished in 72.81s, 5493.49 req/s, 645.26MB/s
requests: 400000 total, 400000 started, 400000 done, 400000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 400000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 45.88GB (49266000000) total, 59.89MB (62800000) headers (space savings 35.39%), 45.82GB (49200000000) data
UDP datagram: 1297880 sent, 2561723 received
                     min         max         mean         sd        +/- sd
time for request:     1.04ms     32.30ms      3.47ms       866us    72.19%
time for connect:    17.34ms     37.90ms     30.31ms      7.39ms    65.00%
time to 1st byte:    39.80ms     44.56ms     42.17ms      1.34ms    65.00%
req/s           :     274.68      300.91      288.35       10.60    40.00%
Samples: 1M of event 'cycles:P', Event count (approx.): 970950082837
Overhead  Symbol
  10.24%  [.] __memmove_avx_unaligned_erms
   5.14%  [.] _aesni_ctr32_ghash_6x
   5.09%  [.] freelist_new(_InkFreeList*)
   3.22%  [.] <alloc::string::String as core::fmt::Write>::write_str
   2.61%  [.] core::fmt::write
   2.55%  [.] _int_malloc
   1.91%  [.] <std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::write_all
   1.81%  [.] freelist_free(_InkFreeList*, void*)
   1.48%  [.] serde_json::ser::format_escaped_str
   1.36%  [.] quiche::Connection::send_single
   1.33%  [k] syscall_return_via_sysret
   1.31%  [k] rep_movs_alternative
   1.18%  [.] IOBufferBlock::clear()
   1.10%  [.] (anonymous namespace)::build_iovec_block_chain(unsigned int, long, Ptr<IOBufferBlock>&, iovec*) [clone .constprop.0]
   1.03%  [.] _int_free
   1.03%  [k] entry_SYSCALL_64_after_hwframe
   1.03%  [.] realloc
   1.01%  [.] core::fmt::Formatter::pad_integral
   0.97%  [.] <core::time::Duration as core::fmt::Debug>::fmt::fmt_decimal
   0.94%  [.] core::fmt::num::imp::<impl core::fmt::Display for usize>::fmt

QUICStream::id() and QUICStreamManager::find_stream:

Samples: 1M of event 'cycles:P', Event count (approx.): 970950082837
Overhead  Symbol
   0.05%  [.] QUICStream::id() const  -      -
Samples: 1M of event 'cycles:P', Event count (approx.): 970950082837
Overhead  Symbol
   0.04%  [.] QUICStreamManager::find_stream(unsigned long)  -      -

Without the patch:

Could not use the same 400k request as the test before because the process ran out of memory. even with 200k it started swapping, so at least we can see that the two functions mentioned up above are on the top list.

finished in 52.08s, 3840.39 req/s, 451.09MB/s
requests: 200000 total, 200000 started, 200000 done, 200000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 22.94GB (24633000000) total, 29.95MB (31400000) headers (space savings 35.39%), 22.91GB (24600000000) data
UDP datagram: 795553 sent, 1271448 received
                     min         max         mean         sd        +/- sd
time for request:     1.51ms    706.84ms      5.00ms      6.24ms    99.68%
time for connect:    16.42ms     38.93ms     28.44ms      7.43ms    60.00%
time to 1st byte:    31.85ms     43.99ms     37.77ms      4.06ms    60.00%
req/s           :     192.05      215.89      200.38        9.53    80.00%

Samples: 1M of event 'cycles:P', Event count (approx.): 791082557670
Overhead  Symbol
>> 23.45%  [.] QUICStream::id() const <<<<<<<< HERE
   7.26%  [.] __memmove_avx_unaligned_erms
   3.61%  [.] freelist_new(_InkFreeList*)
   3.09%  [.] _aesni_ctr32_ghash_6x
   1.91%  [.] <alloc::string::String as core::fmt::Write>::write_str
   1.88%  [k] __irqentry_text_end
   1.64%  [.] _int_malloc
   1.61%  [.] core::fmt::write
   1.39%  [.] freelist_free(_InkFreeList*, void*)
   1.24%  [k] error_entry
   1.20%  [.] <std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::write_all
   0.96%  [k] syscall_return_via_sysret
>> 0.95%  [.] QUICStreamManager::find_stream(unsigned long) <<<<<<< HERE
   0.93%  [.] serde_json::ser::format_escaped_str
   0.91%  [.] IOBufferBlock::clear()
   0.87%  [.] (anonymous namespace)::build_iovec_block_chain(unsigned int, long, Ptr<IOBufferBlock>&, iovec*) [clone .constprop.0]
   0.85%  [.] realloc

Fixes: https://github.com/apache/trafficserver/issues/11446

brbzull0 avatar Jun 28 '24 11:06 brbzull0