xandra Performance regression in 0.18 (at least compared to 0.12)

Performance regression in 0.18 (at least compared to 0.12)

Open peixian opened this issue 1 year ago • 8 comments

Adding this in case other people run into it. I'm still trying to figure out what the root cause is, but it seems like configuring the pool size affects runtime latency on the newest version.

On 0.12.0, we were running with 300 connections to each Scylla node underneath. On 0.18.1, I tried it with 2 connections per node, as well as 300 connections per node in pool_size.

0.12.0, 300 connections:

Operating System: Linux
CPU Information: Intel(R) Xeon(R) CPU @ 2.80GHz
Number of Available Cores: 4
Available memory: 7.77 GB
Elixir 1.16.1
Erlang 26
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 28 s

Benchmarking lines by game sharded 100 - 0.12.0 - 300 ...
Benchmarking lines single - 0.12.0 - 300 ...
Benchmarking lines specific - 0.12.0 - 300 ...
Benchmarking user single - 0.12.0 - 300 ...
Calculating statistics...
Formatting results...

Name                                                             ips        average  deviation         median         99th %
SELECT * FROM L WHERE id='123' - 0.18.1 - 300                    2.45 K        0.41 ms    ±38.85%        0.40 ms        0.70 ms
SELECT * FROM L LIMIT 1 - 0.18.1 - 300                           1.66 K        0.60 ms    ±36.74%        0.56 ms        1.34 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300                           0.67 K        1.49 ms    ±27.70%        1.41 ms        2.43 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300        0.36 K        2.77 ms    ±28.29%        2.68 ms        4.69 ms

Comparison:
SELECT * FROM L WHERE id='123' - 0.18.1 - 300                    2.45 K
SELECT * FROM L LIMIT 1 - 0.18.1 - 300                           1.66 K - 1.48x slower +0.194 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300                           0.67 K - 3.66x slower +1.08 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300        0.36 K - 6.78x slower +2.36 ms

0.18.1, 300 connections:

Name                                                                ips        average  deviation         median         99th %
SELECT * FROM L WHERE id='123' - 0.18.1 - 300                    1.18 K        0.85 ms    ±35.47%        0.80 ms        1.63 ms
SELECT * FROM L LIMIT 1 - 0.18.1 - 300                           1.01 K        0.99 ms    ±32.83%        0.95 ms        1.73 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300                           0.53 K        1.90 ms    ±29.20%        1.80 ms        4.61 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300        0.30 K        3.35 ms    ±16.94%        3.27 ms        5.68 ms

Comparison:
SELECT * FROM L WHERE id='123' - 0.18.1 - 300                    1.18 K
SELECT * FROM L LIMIT 1 - 0.18.1 - 300                           1.01 K - 1.17x slower +0.147 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 300                           0.53 K - 2.25x slower +1.05 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 300        0.30 K - 3.97x slower +2.51 ms

0.18.1, 5 connections:

Name                                                              ips        average  deviation         median         99th %
SELECT * FROM L WHERE id='123' - 0.18.1 - 5                    1.39 K        0.72 ms    ±47.16%        0.68 ms        1.34 ms
SELECT * FROM L LIMIT 1 - 0.18.1 - 5                           1.17 K        0.85 ms    ±43.76%        0.81 ms        1.41 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 5                           0.62 K        1.60 ms    ±29.08%        1.54 ms        2.42 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 5        0.35 K        2.82 ms    ±16.92%        2.79 ms        4.01 ms

Comparison:
SELECT * FROM L WHERE id='123' - 0.18.1 - 5                    1.39 K
SELECT * FROM L LIMIT 1 - 0.18.1 - 5                           1.17 K - 1.19x slower +0.135 ms
SELECT * FROM U LIMIT 1 - 0.18.1 - 5                           0.62 K - 2.22x slower +0.88 ms
SELECT * FROM materialized_view LIMIT 1000 - 0.18.1 - 5        0.35 K - 3.92x slower +2.10 ms

I also reordered and randomly ran the benchmark multiple times to make sure it wasn't some cache issue.

Configuration:

[
  nodes: [<NODES>],
  autodiscovery: false,
  pool_size: 300, # or 2 in the case of tests.
]

Feb 22 '24 18:02 peixian

xandra xandra copied to clipboard

Performance regression in 0.18 (at least compared to 0.12)

xandra
xandra copied to clipboard