How to tune dragonfly
We are currently investigating the performance differences between dragonfly and redis. We found that when pipeline=1, dragonfly's performance is significantly better than redis. However, when pipeline is enabled, redis's performance improves significantly, but dragonfly's performance deteriorates rapidly. Is there something I am doing wrong?
version : 1.35.1
- Start dragonfly
dragonfly --proactor_threads=80
- Before starting the pipeline
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 588340.28 --- --- 0.69435 0.15100 16.63900 40.44700 321110.27
Gets 1764723.46 1764723.46 0.00 0.69175 0.15100 16.76700 40.44700 954551.58
Waits 0.00 --- --- --- --- --- --- ---
Totals 2353063.74 1764723.46 0.00 0.69240 0.15100 16.76700 40.44700 1275661.85
- After starting the pipeline
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 665314.36 --- --- 18.90981 15.10300 76.79900 146.43100 363121.92
Gets 1995765.14 1995765.14 0.00 18.12827 14.46300 74.23900 141.31100 1079523.58
Waits 0.00 --- --- --- --- --- --- ---
Totals 2661079.50 1995765.14 0.00 18.32367 14.59100 74.75100 143.35900 1442645.50
hi, are you running both memtier and dragonfly on the same machine?
Of course not, otherwise Redis performance wouldn't have improved. I will provide more detailed test data. one machine for server, one machine for client.
Version:
- Redis: 7.4.5
- Dragonfly: 1.35.1
Environment:
- Package: dragonfly.x86_64.rpm
- OS: OpenEuler 24.03
- Kernel: x86_64 Linux 6.6.0
redis cluster with 64 shared
dragonfly --proactor_threads=64
redis(pipline = 1):
memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
dragonfly(pipline = 1):
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
redis(pipline = 30):
memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
dragonfly(pipline = 30):
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
I can not say for sure why this happens for you. I observe several things that looks strange to me:
- P99 and P99.9 are very high for Dragonfly even pipeline=1.
- with Pipeline=30 the QPS stays almost the same but latency goes up significantly.
htop screenshots on both client (memtier) machine and the server machine will provide more input.
Also, I am curious what results do you get with redis-cluster running with pipeline=30?
If I would guess, I suggest reducung number of clients to 3 when benchmarking Dragonfly with pipeline=30:
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=3 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
My guess is that your server is bottlenecked on networking interrupts (htop - per CPU utilization screen will help to see it) and this is why your P99, P99.9 are so high.
Finally, (and it's more advanced stuff) when we benchmark Dragonfly, we usually do some networking tuning - pin networking IRQs to differrent CPUs , disable irqbalance and run dragonfly with --conn_use_incoming_cpu flag that moves each connection to the networking CPU that handles that socket.
Thank you for your guidance. After pinning networking IRQs to differ CPUs, tail latency improved, but throughput still didn't increase significantly. Pipeline provided too little improvement for dragonfly; conversely, Redis saw a significant boost.
dragonfly pipeline = 1
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 805595.33 --- --- 0.51133 0.51100 1.14300 2.14300 439685.78
Gets 2416710.96 2325021.98 91688.98 0.50974 0.50300 1.14300 2.11100 1261103.10
Waits 0.00 --- --- --- --- --- --- ---
Totals 3222306.28 2325021.98 91688.98 0.51014 0.50300 1.14300 2.12700 1700788.88
dragonfly pipeline = 30
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=3 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1057104.52 --- --- 1.19004 1.08700 3.13500 5.11900 576956.90
Gets 3171310.43 3171297.94 12.48 1.16118 1.05500 3.08700 5.05500 1704796.92
Waits 0.00 --- --- --- --- --- --- ---
Totals 4228414.95 3171297.94 12.48 1.16839 1.06300 3.10300 5.08700 2281753.82
redis pipeline = 1
memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
ALL STATS
======================================================================================================================================================
Type Ops/sec Hits/sec Misses/sec MOVED/sec ASK/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets 718706.97 --- --- 0.00 0.00 0.40164 0.39100 0.88700 1.16700 392263.03
Gets 2156140.32 2156140.32 0.00 0.00 0.00 0.40052 0.39100 0.88700 1.16700 1166271.71
Waits 0.00 --- --- --- --- --- --- --- --- ---
Totals 2874847.29 2156140.32 0.00 0.00 0.00 0.40080 0.39100 0.88700 1.16700 1558534.74
redis pipeline = 30
memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
ALL STATS
======================================================================================================================================================
Type Ops/sec Hits/sec Misses/sec MOVED/sec ASK/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets 3390153.43 --- --- 0.00 0.00 2.61658 2.52700 4.79900 6.11100 1850311.60
Gets 10170460.28 10161982.50 8477.78 0.00 0.00 2.73053 2.54300 5.56700 6.23900 5497010.71
Waits 0.00 --- --- --- --- --- --- --- --- ---
Totals 13560613.70 10161982.50 8477.78 0.00 0.00 2.70204 2.54300 5.50300 6.20700 7347322.30
I also found that many [items] are generated in the dragonfly directory.
dump-2025-12-xxT15:xx:xx-0063.dfs
dump-2025-12-xxT15:xx:xx-summary.dfs.
I'm not sure what their is, maybe snapshot,or if it would be better to turn them off for benchmark ?
it's because your ports are opened to the internet and hacking bots are constantly scanning 6379 and run flushdb and SAVE. i suggest running all the benchmarks with ports 6380 or close 6379 to the internet. you can remove these files.
hhhhh,our machines don't connect to the internet by default. I try 6380 and other more. result as the same.
Ah, the files are created on dragonfly shutdown by default. you can pass --dbfilename= to omit backups.
Yes, the results look more reasonable now. I see x3 differrence in QPS with pipeline mode and this probably correct - we have not optimized pipelining and Redis/Valkey have recently made significant optimisations in that area. We will work on that next quarter.
Having said that, it's not apples to apples as you can access from a single dragonfly process the entire keyspace of your cluster (i.e. MSET/MGET and transactions will work for the entire keyspace) while with Redis Cluster you have hard separation with slots. If for your usecase you can run and manage dozens of redis-processes on a single server and it works well for you (no hot keys, no management complexity) then you probably do not need Dragonfly as even with the optimisations implemented we won't be higher than 13M qps with pipelining.
Thank you very much. I will continue to research the advantages of dragonfly and choose the appropriate key-value cache for our needs.