redpanda
redpanda copied to clipboard
bytes: replacement bytes implementation for libc++18
libc++ >= v18 have deprecated (to be removed in v19) char_traits<T> for T other than char (and some other types, like wchar). our bytes implementation is uses T=uint8_t and because seastar::sstring interoperates with std::string/std::string_view, we encounter the deprecation.
this PR introduces a new bytes implementation that wraps a seastar::sstring
Backports Required
- [x] none - not a bug fix
- [ ] none - this is a backport
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [ ] v24.2.x
- [ ] v24.1.x
- [ ] v23.3.x
Release Notes
- none
~I will probably switch this over to use abseil inlinedvector after examining it i think we can achieve a similar uninitialized allocation optimization.~
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53648#01919687-a202-4ac3-8e9b-332abfcecaef
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54467#0191ecb4-4b10-4abb-95f9-108199563778
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54467#0191edb6-6c7f-4622-ae75-732df0996cf2
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54499#0191f739-8cf9-417c-8997-287c31d17996
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54905#01921b57-4157-4d81-9700-0e4b9b9330bb
@travisdowns @StephanDollberg @rockwotj in the interest of avoiding the UB concern entirely (but like travis mentioned, perhaps its UB that we are ok with), i added a new commit that implements bytes in terms of absl::inlined_vector, and created a benchmark for a few cases:
- initialized_later
- zero initialization
- append
bytes.*: abseil version
sstring.*: sstring version
as expected, initialized_later is faster with sstring because abseil interface doesn't offer the option to skip initialization (in this benchmark it is zero initialization).
roughly, up to 128K sizes, we'd pay ~microsecond of overhead. presumably we could also go hunting around the code base for usages of bytes() that are in a hot path and change how the bytes type is used if they show up in a profile?
14: test iterations median mad min max allocs tasks inst
14: bytes.initialized_later_0 459500000 1.373ns 0.000ns 1.373ns 1.375ns 0.000 0.000 17.3
14: sstring.initialized_later_0 503104000 1.189ns 0.000ns 1.187ns 1.190ns 0.000 0.000 10.3
14: bytes.initialized_later_10 290141000 2.604ns 0.001ns 2.603ns 2.604ns 0.000 0.000 36.3
14: sstring.initialized_later_10 489875000 1.206ns 0.002ns 1.205ns 1.208ns 0.000 0.000 10.3
14: bytes.initialized_later_100 106797000 8.209ns 0.000ns 8.208ns 8.210ns 1.000 0.000 143.3
14: sstring.initialized_later_100 118622000 7.262ns 0.002ns 7.257ns 7.266ns 1.000 0.000 123.3
14: bytes.initialized_later_1000 74471000 9.398ns 0.013ns 9.385ns 9.428ns 1.000 0.000 172.3
14: sstring.initialized_later_1000 89729000 7.098ns 0.001ns 7.097ns 7.100ns 1.000 0.000 123.3
14: bytes.initialized_later_10000 10988000 60.754ns 0.007ns 60.747ns 60.777ns 1.000 0.000 424.3
14: sstring.initialized_later_10000 26059000 7.074ns 0.002ns 7.072ns 7.077ns 1.000 0.000 123.3
14: bytes.initialized_later_100000 1095000 615.564ns 0.075ns 615.475ns 618.526ns 1.000 0.000 3389.3
14: sstring.initialized_later_100000 3005000 28.151ns 0.082ns 28.070ns 28.300ns 1.000 0.000 630.3
14: bytes.initialized_zero_0 463474000 1.361ns 0.000ns 1.361ns 1.362ns 0.000 0.000 17.3
14: sstring.initialized_zero_0 170729000 5.054ns 0.000ns 5.053ns 5.056ns 0.000 0.000 29.3
14: bytes.initialized_zero_10 306899000 2.421ns 0.000ns 2.420ns 2.421ns 0.000 0.000 36.3
14: sstring.initialized_zero_10 169572000 5.056ns 0.001ns 5.055ns 5.059ns 0.000 0.000 29.3
14: bytes.initialized_zero_100 102871000 8.575ns 0.000ns 8.572ns 8.576ns 1.000 0.000 143.3
14: sstring.initialized_zero_100 104785000 8.371ns 0.004ns 8.364ns 8.375ns 1.000 0.000 138.3
14: bytes.initialized_zero_1000 74622000 9.385ns 0.001ns 9.383ns 9.387ns 1.000 0.000 172.3
14: sstring.initialized_zero_1000 74918000 9.383ns 0.001ns 9.374ns 9.388ns 1.000 0.000 167.3
14: bytes.initialized_zero_10000 10982000 60.763ns 0.003ns 60.756ns 60.772ns 1.000 0.000 424.3
14: sstring.initialized_zero_10000 10971000 60.759ns 0.002ns 60.748ns 60.766ns 1.000 0.000 419.3
14: bytes.initialized_zero_100000 1095000 615.436ns 0.028ns 615.382ns 615.481ns 1.000 0.000 3389.3
14: sstring.initialized_zero_100000 1095000 616.402ns 0.088ns 616.314ns 616.564ns 1.000 0.000 3383.3
14: bytes.append_0 276549000 2.812ns 0.000ns 2.812ns 2.814ns 0.000 0.000 57.3
14: sstring.append_0 83581000 11.155ns 0.000ns 11.155ns 11.166ns 0.000 0.000 87.3
14: bytes.append_10 94539000 9.884ns 0.001ns 9.879ns 9.885ns 0.000 0.000 127.3
14: sstring.append_10 71173000 13.316ns 0.001ns 13.310ns 13.316ns 0.000 0.000 121.3
14: bytes.append_100 48183000 19.572ns 0.007ns 19.564ns 19.585ns 2.000 0.000 410.3
14: sstring.append_100 49985000 18.837ns 0.002ns 18.830ns 18.842ns 2.000 0.000 336.3
14: bytes.append_1000 18448000 51.085ns 0.064ns 50.851ns 51.148ns 2.000 0.000 724.3
14: sstring.append_1000 21044000 43.598ns 0.051ns 43.546ns 43.729ns 2.000 0.000 492.3
14: bytes.append_10000 2714000 337.452ns 0.048ns 337.404ns 337.749ns 2.000 0.000 3898.3
14: sstring.append_10000 3532000 253.486ns 0.004ns 253.457ns 253.490ns 2.000 0.000 1868.3
14: bytes.append_100000 261000 3.533us 0.165ns 3.533us 3.533us 2.000 0.000 33147.3
14: sstring.append_100000 316000 2.874us 0.108ns 2.873us 2.874us 2.000 0.000 13213.3
14: Test Exit code 0
1/1 Test #14: bytes_bench_rpbench .............. Passed 216.32 sec
As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.
As a side note, it looks like
std::stringnow (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.
yeh. that interface was rejected for std::vector and i think also std::inplace_vector for some reason.
Yeah bypassing default constructors and such can be tricky. It's easier to explain with raw bytes.
/microbench
Instruction and alloc count diffs from other microbenches:
Performance changes detected in 9 tests
storage_rpbench_reducer_bench.compaction_key_reducer_test: inst -> +2.39%
heartbeat_bench_rpbench_fixture.test_old_hb_reply: inst -> +11.18%
heartbeat_bench_rpbench_fixture.test_old_hb_reply: allocs -> +0.00%
heartbeat_bench_rpbench_fixture.test_old_hb_request: inst -> +11.10%
crypto_bench_rpbench_openssl_perf_test.md5_1k: inst -> +0.14%
crypto_bench_rpbench_openssl_perf_test.sha256_1k: inst -> +0.04%
crypto_bench_rpbench_openssl_perf_test.sha512_1k: inst -> +0.06%
crypto_bench_fips_rpbench_openssl_perf_test.md5_1k: inst -> +0.14%
crypto_bench_fips_rpbench_openssl_perf_test.sha256_1k: inst -> +0.04%
crypto_bench_fips_rpbench_openssl_perf_test.sha512_1k: inst -> +0.06%
Instruction and alloc count diffs from other microbenches:
that seems unsurprising with abseil's inlined vector. what's the threshold for concern?
There is no general policy, will always have to go on a case by case basis.
This looks fine to me as well given the only major change is in the old heartbeats which shouldn't have any major usage anymore.
@rockwotj wrote:
As a side note, it looks like
std::stringnow (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.
Well it doesn't really allow you to do the "created but uninitialized" thing, I don't think:
If any of the following conditions is satisfied, the behavior is undefined:
...
- Any character in range [p, p + r) has an indeterminate value.
So it's saying it's UB to (for example), pass in an op which simply returns count but does not write the corresponding chars, which is how you'd emulate allocate-but-not-init.
Still, even if you adhere to this rule this interface can replace some of the reasons you want uninit storage in the first place.
Of course, breaking this rule seems quite unlikely to be punished in practice.
Overall the code LGTM, just one thing that the new struct is 8 bytes bigger and IDK if that's OK.
yeh, also noticed this. i don't think it matters, and i'm not sure where the existing bytes_inline_size comes from. it was probably an educated guess.
tbh i'd like to teach iobuf SSO and get rid of bytes type entirely.
thanks for the review @rockwotj. i have a merge conflict, and a few things to cleanup. should be able to get this posted again next week.
I agree that the 8 bytes increase is probably OK.
⚠️ The sha of the head commit of this PR conflicts with #23422. Mergify cannot evaluate rules on this PR. ⚠️