xandra
xandra copied to clipboard
Performance regression in 0.18 (at least compared to 0.12)
Adding this in case other people run into it. I'm still trying to figure out what the root cause is, but it seems like configuring the pool size affects runtime latency on the newest version.
On 0.12.0, we were running with 300 connections to each Scylla node underneath. On 0.18.1, I tried it with 2 connections per node, as well as 300 connections per node in pool_size
.
0.12.0
, 300 connections:
Operating System: Linux
CPU Information: Intel(R) Xeon(R) CPU @ 2.80GHz
Number of Available Cores: 4
Available memory: 7.77 GB
Elixir 1.16.1
Erlang 26
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 28 s
Benchmarking lines by game sharded 100 - 0.12.0 - 300 ...
Benchmarking lines single - 0.12.0 - 300 ...
Benchmarking lines specific - 0.12.0 - 300 ...
Benchmarking user single - 0.12.0 - 300 ...
Calculating statistics...
Formatting results...
Name ips average deviation median 99th %
SELECT * FROM L WHERE id='123' - 0.18.1 - 300 2.45 K 0.41 ms ±38.85% 0.40 ms 0.70 ms
SELECT * FROM L LIMIT 1 - 0.18.1 - 300 1.66 K 0.60 ms ±36.74% 0.56 ms 1.34 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300 0.67 K 1.49 ms ±27.70% 1.41 ms 2.43 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300 0.36 K 2.77 ms ±28.29% 2.68 ms 4.69 ms
Comparison:
SELECT * FROM L WHERE id='123' - 0.18.1 - 300 2.45 K
SELECT * FROM L LIMIT 1 - 0.18.1 - 300 1.66 K - 1.48x slower +0.194 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300 0.67 K - 3.66x slower +1.08 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300 0.36 K - 6.78x slower +2.36 ms
0.18.1
, 300 connections:
Name ips average deviation median 99th %
SELECT * FROM L WHERE id='123' - 0.18.1 - 300 1.18 K 0.85 ms ±35.47% 0.80 ms 1.63 ms
SELECT * FROM L LIMIT 1 - 0.18.1 - 300 1.01 K 0.99 ms ±32.83% 0.95 ms 1.73 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300 0.53 K 1.90 ms ±29.20% 1.80 ms 4.61 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300 0.30 K 3.35 ms ±16.94% 3.27 ms 5.68 ms
Comparison:
SELECT * FROM L WHERE id='123' - 0.18.1 - 300 1.18 K
SELECT * FROM L LIMIT 1 - 0.18.1 - 300 1.01 K - 1.17x slower +0.147 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300 0.53 K - 2.25x slower +1.05 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300 0.30 K - 3.97x slower +2.51 ms
0.18.1
, 5 connections:
Name ips average deviation median 99th %
SELECT * FROM L WHERE id='123' - 0.18.1 - 5 1.39 K 0.72 ms ±47.16% 0.68 ms 1.34 ms
SELECT * FROM L LIMIT 1 - 0.18.1 - 5 1.17 K 0.85 ms ±43.76% 0.81 ms 1.41 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 5 0.62 K 1.60 ms ±29.08% 1.54 ms 2.42 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 5 0.35 K 2.82 ms ±16.92% 2.79 ms 4.01 ms
Comparison:
SELECT * FROM L WHERE id='123' - 0.18.1 - 5 1.39 K
SELECT * FROM L LIMIT 1 - 0.18.1 - 5 1.17 K - 1.19x slower +0.135 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 5 0.62 K - 2.22x slower +0.88 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 5 0.35 K - 3.92x slower +2.10 ms
I also reordered and randomly ran the benchmark multiple times to make sure it wasn't some cache issue.
Configuration:
[
nodes: [<NODES>],
autodiscovery: false,
pool_size: 300, # or 2 in the case of tests.
]