H3: Make sure we remove streams when needed.
Streams weren’t deleted when needed. Stream list was increasing in size as items were never removed from it, generating a very large list and making finding items slow as we needed to walk over a large list. This fixes the logic to check if the stream is ready to be removed, if so, then the stream is correctly removed from the list.
The main issue was the cmp in is_readable. Beside this I made another change to not call find_stream twice when creating a stream(probably could be in a separate pr).
Notes
Tests done on a small nuc with:
Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz 32G mem 4 Cores 8T
Build in Release mode.
So, with the patch:
h2load -n 400000 -c 20 --alpn-list=h3 https://fedora:4443/cache/123000
Numbers from h2load are very crappy, but based on the hardware used, that's not the main point.
finished in 72.81s, 5493.49 req/s, 645.26MB/s
requests: 400000 total, 400000 started, 400000 done, 400000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 400000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 45.88GB (49266000000) total, 59.89MB (62800000) headers (space savings 35.39%), 45.82GB (49200000000) data
UDP datagram: 1297880 sent, 2561723 received
min max mean sd +/- sd
time for request: 1.04ms 32.30ms 3.47ms 866us 72.19%
time for connect: 17.34ms 37.90ms 30.31ms 7.39ms 65.00%
time to 1st byte: 39.80ms 44.56ms 42.17ms 1.34ms 65.00%
req/s : 274.68 300.91 288.35 10.60 40.00%
Samples: 1M of event 'cycles:P', Event count (approx.): 970950082837
Overhead Symbol
10.24% [.] __memmove_avx_unaligned_erms
5.14% [.] _aesni_ctr32_ghash_6x
5.09% [.] freelist_new(_InkFreeList*)
3.22% [.] <alloc::string::String as core::fmt::Write>::write_str
2.61% [.] core::fmt::write
2.55% [.] _int_malloc
1.91% [.] <std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::write_all
1.81% [.] freelist_free(_InkFreeList*, void*)
1.48% [.] serde_json::ser::format_escaped_str
1.36% [.] quiche::Connection::send_single
1.33% [k] syscall_return_via_sysret
1.31% [k] rep_movs_alternative
1.18% [.] IOBufferBlock::clear()
1.10% [.] (anonymous namespace)::build_iovec_block_chain(unsigned int, long, Ptr<IOBufferBlock>&, iovec*) [clone .constprop.0]
1.03% [.] _int_free
1.03% [k] entry_SYSCALL_64_after_hwframe
1.03% [.] realloc
1.01% [.] core::fmt::Formatter::pad_integral
0.97% [.] <core::time::Duration as core::fmt::Debug>::fmt::fmt_decimal
0.94% [.] core::fmt::num::imp::<impl core::fmt::Display for usize>::fmt
QUICStream::id() and QUICStreamManager::find_stream:
Samples: 1M of event 'cycles:P', Event count (approx.): 970950082837
Overhead Symbol
0.05% [.] QUICStream::id() const - -
Samples: 1M of event 'cycles:P', Event count (approx.): 970950082837
Overhead Symbol
0.04% [.] QUICStreamManager::find_stream(unsigned long) - -
Without the patch:
Could not use the same 400k request as the test before because the process ran out of memory. even with 200k it started swapping, so at least we can see that the two functions mentioned up above are on the top list.
finished in 52.08s, 3840.39 req/s, 451.09MB/s
requests: 200000 total, 200000 started, 200000 done, 200000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 22.94GB (24633000000) total, 29.95MB (31400000) headers (space savings 35.39%), 22.91GB (24600000000) data
UDP datagram: 795553 sent, 1271448 received
min max mean sd +/- sd
time for request: 1.51ms 706.84ms 5.00ms 6.24ms 99.68%
time for connect: 16.42ms 38.93ms 28.44ms 7.43ms 60.00%
time to 1st byte: 31.85ms 43.99ms 37.77ms 4.06ms 60.00%
req/s : 192.05 215.89 200.38 9.53 80.00%
Samples: 1M of event 'cycles:P', Event count (approx.): 791082557670
Overhead Symbol
>> 23.45% [.] QUICStream::id() const <<<<<<<< HERE
7.26% [.] __memmove_avx_unaligned_erms
3.61% [.] freelist_new(_InkFreeList*)
3.09% [.] _aesni_ctr32_ghash_6x
1.91% [.] <alloc::string::String as core::fmt::Write>::write_str
1.88% [k] __irqentry_text_end
1.64% [.] _int_malloc
1.61% [.] core::fmt::write
1.39% [.] freelist_free(_InkFreeList*, void*)
1.24% [k] error_entry
1.20% [.] <std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::write_all
0.96% [k] syscall_return_via_sysret
>> 0.95% [.] QUICStreamManager::find_stream(unsigned long) <<<<<<< HERE
0.93% [.] serde_json::ser::format_escaped_str
0.91% [.] IOBufferBlock::clear()
0.87% [.] (anonymous namespace)::build_iovec_block_chain(unsigned int, long, Ptr<IOBufferBlock>&, iovec*) [clone .constprop.0]
0.85% [.] realloc
Fixes: https://github.com/apache/trafficserver/issues/11446