rabbitmq-server
rabbitmq-server copied to clipboard
Use `persistent_term` for the feature flag registry
This is an experiment for using persistent_term to hold the feature flag registry. rabbit_ff_registry is currently implemented with a module that is dynamically regenerated with erl_syntax, compile:forms/2 and code reloading. The module approach has the advantage of maximizing read speed but it has a few drawbacks:
- We need workarounds to convince the dialyzer of the real typespecs of
rabbit_ff_registryfunctions. - We need to be careful to not deadlock around the code server.
- Regenerating the registry can be relatively slow: around 35-40ms for me locally per regeneration. With this patch I see a total savings of ~800ms during a regular single-node boot (
bazel run broker) and a total savings of ~200ms withrabbitmqctl enable_feature_flag khepri_dbfor example.
The question is whether persistent_term will be fast enough. Running the usual one_fast perf-test flag setup I didn't see a difference between this branch and main but we'll want to run this through the full performance test suite before considering it.
This branch also includes a concurrency fix for rabbit_feature_flags:inject_test_feature_flags/2 which can cause the feature_flags_SUITE:registry_concurrent_reloads/1 case to misbehave.
very creative!
I took some microbenchmarks of the two approaches - generating a module (albeit an Elixir module in the test) and reading from a :persistent_term - for the rabbit_ff_registry:is_enabled/1 use-case:
(which can be run with mv bench.exs.txt bench.exs and elixir bench.exs)
Benchee results...
Operating System: Linux
CPU Information: AMD Ryzen Threadripper 3970X 32-Core Processor
Number of Available Cores: 64
Available memory: 125.64 GB
Elixir 1.16.2
Erlang 26.2.4
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 1 s
time: 10 s
memory time: 2 s
reduction time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 30 s
Benchmarking dynamic module ...
Benchmarking persistent_term ...
Calculating statistics...
Formatting results...
Name ips average deviation median 99th %
dynamic module 30.31 M 32.99 ns ±7714.91% 30 ns 41 ns
persistent_term 17.67 M 56.58 ns ±4861.17% 51 ns 70 ns
Comparison:
dynamic module 30.31 M
persistent_term 17.67 M - 1.71x slower +23.59 ns
Memory usage statistics:
Name Memory usage
dynamic module 0 B
persistent_term 0 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
dynamic module 2
persistent_term 2 - 1.00x reduction count +0
**All measurements for reduction count were the same**
persistent_term is ~1.7x slower than is_enabled/1 on a module using the current set of feature flags. This is better than I expected: persistent_term is certainly slower but not by a very wide margin since both operations are so cheap.
I also ran this branch against the performance tests vs. main: https://grafana.lionhead.rabbitmq.com/d/1OSIKB4Vk/rabbitmq-performance-tests?orgId=1&from=1713189778195&to=1713195192973&shareView=link
At a glance the two branches are hard to tell apart. The persistent_term may be slower but we might not read feature flags so often that it's noticeable. @mkuratczyk could I bug you to take a look and weigh in on these results when you get a chance?
Yeah, I can't think of code paths where this could slow down normal operations. Small differences between environments are unfortunately always there in these tests (there are multiple reasons for this, including intentional randomization in a few places, both in perf-test as well as on the broker side). I'm fairly sure the observed differences are due to such entropy, not this PR
On an M1 machine with 10 cores and a fast SSD, this shaves some 9% off of node-reported startup time 👏 on top of the improvements in #10989. This should nicely benefit all future CI runs where clusters are formed and nodes are restarted many times in total.
Force-push was just a rebase