dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

How to tune dragonfly

Open WJSGDBZ opened this issue 1 month ago • 10 comments

We are currently investigating the performance differences between dragonfly and redis. We found that when pipeline=1, dragonfly's performance is significantly better than redis. However, when pipeline is enabled, redis's performance improves significantly, but dragonfly's performance deteriorates rapidly. Is there something I am doing wrong?

version : 1.35.1

  • Start dragonfly
dragonfly --proactor_threads=80
  • Before starting the pipeline
memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       588340.28          ---          ---         0.69435         0.15100        16.63900        40.44700    321110.27
Gets      1764723.46   1764723.46         0.00         0.69175         0.15100        16.76700        40.44700    954551.58
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    2353063.74   1764723.46         0.00         0.69240         0.15100        16.76700        40.44700   1275661.85
  • After starting the pipeline
memtier_benchmark -h x.x.x.x  --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       665314.36          ---          ---        18.90981        15.10300        76.79900       146.43100    363121.92
Gets      1995765.14   1995765.14         0.00        18.12827        14.46300        74.23900       141.31100   1079523.58
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    2661079.50   1995765.14         0.00        18.32367        14.59100        74.75100       143.35900   1442645.50

WJSGDBZ avatar Dec 10 '25 09:12 WJSGDBZ

hi, are you running both memtier and dragonfly on the same machine?

romange avatar Dec 10 '25 12:12 romange

Of course not, otherwise Redis performance wouldn't have improved. I will provide more detailed test data. one machine for server, one machine for client.

Version:

  • Redis: 7.4.5
  • Dragonfly: 1.35.1

Environment:

redis cluster with 64 shared
dragonfly --proactor_threads=64

redis(pipline = 1):

memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1

dragonfly(pipline = 1):

memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
Image Image

redis(pipline = 30):

memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30

dragonfly(pipline = 30):

memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
Image Image

WJSGDBZ avatar Dec 11 '25 02:12 WJSGDBZ

I can not say for sure why this happens for you. I observe several things that looks strange to me:

  1. P99 and P99.9 are very high for Dragonfly even pipeline=1.
  2. with Pipeline=30 the QPS stays almost the same but latency goes up significantly.

htop screenshots on both client (memtier) machine and the server machine will provide more input.

Also, I am curious what results do you get with redis-cluster running with pipeline=30?

If I would guess, I suggest reducung number of clients to 3 when benchmarking Dragonfly with pipeline=30: memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=3 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30

My guess is that your server is bottlenecked on networking interrupts (htop - per CPU utilization screen will help to see it) and this is why your P99, P99.9 are so high.

Finally, (and it's more advanced stuff) when we benchmark Dragonfly, we usually do some networking tuning - pin networking IRQs to differrent CPUs , disable irqbalance and run dragonfly with --conn_use_incoming_cpu flag that moves each connection to the networking CPU that handles that socket.

romange avatar Dec 11 '25 06:12 romange

Thank you for your guidance. After pinning networking IRQs to differ CPUs, tail latency improved, but throughput still didn't increase significantly. Pipeline provided too little improvement for dragonfly; conversely, Redis saw a significant boost.

dragonfly pipeline = 1

memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=30 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       805595.33          ---          ---         0.51133         0.51100         1.14300         2.14300    439685.78
Gets      2416710.96   2325021.98     91688.98         0.50974         0.50300         1.14300         2.11100   1261103.10
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    3222306.28   2325021.98     91688.98         0.51014         0.50300         1.14300         2.12700   1700788.88

dragonfly pipeline = 30

memtier_benchmark -h x.x.x.x --ratio=1:3 --hide-histogram --threads=55 --clients=3 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets      1057104.52          ---          ---         1.19004         1.08700         3.13500         5.11900    576956.90
Gets      3171310.43   3171297.94        12.48         1.16118         1.05500         3.08700         5.05500   1704796.92
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    4228414.95   3171297.94        12.48         1.16839         1.06300         3.10300         5.08700   2281753.82

redis pipeline = 1

memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=1
ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       718706.97          ---          ---         0.00         0.00         0.40164         0.39100         0.88700         1.16700    392263.03
Gets      2156140.32   2156140.32         0.00         0.00         0.00         0.40052         0.39100         0.88700         1.16700   1166271.71
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          ---
Totals    2874847.29   2156140.32         0.00         0.00         0.00         0.40080         0.39100         0.88700         1.16700   1558534.74

redis pipeline = 30

memtier_benchmark -h x.x.x.x -p xxxx --cluster-mode --ratio=1:3 --hide-histogram --threads=24 --clients=1 --requests=200000000 –-test-time=180 --distinct-client-seed --key-maximum=1000000 --data-size 512 --pipeline=30
ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets      3390153.43          ---          ---         0.00         0.00         2.61658         2.52700         4.79900         6.11100   1850311.60
Gets     10170460.28  10161982.50      8477.78         0.00         0.00         2.73053         2.54300         5.56700         6.23900   5497010.71
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          ---
Totals   13560613.70  10161982.50      8477.78         0.00         0.00         2.70204         2.54300         5.50300         6.20700   7347322.30

WJSGDBZ avatar Dec 11 '25 07:12 WJSGDBZ

I also found that many [items] are generated in the dragonfly directory.

dump-2025-12-xxT15:xx:xx-0063.dfs
dump-2025-12-xxT15:xx:xx-summary.dfs.

I'm not sure what their is, maybe snapshot,or if it would be better to turn them off for benchmark ?

WJSGDBZ avatar Dec 11 '25 07:12 WJSGDBZ

it's because your ports are opened to the internet and hacking bots are constantly scanning 6379 and run flushdb and SAVE. i suggest running all the benchmarks with ports 6380 or close 6379 to the internet. you can remove these files.

romange avatar Dec 11 '25 07:12 romange

hhhhh,our machines don't connect to the internet by default. I try 6380 and other more. result as the same.

WJSGDBZ avatar Dec 11 '25 08:12 WJSGDBZ

Ah, the files are created on dragonfly shutdown by default. you can pass --dbfilename= to omit backups.

romange avatar Dec 11 '25 08:12 romange

Yes, the results look more reasonable now. I see x3 differrence in QPS with pipeline mode and this probably correct - we have not optimized pipelining and Redis/Valkey have recently made significant optimisations in that area. We will work on that next quarter.

Having said that, it's not apples to apples as you can access from a single dragonfly process the entire keyspace of your cluster (i.e. MSET/MGET and transactions will work for the entire keyspace) while with Redis Cluster you have hard separation with slots. If for your usecase you can run and manage dozens of redis-processes on a single server and it works well for you (no hot keys, no management complexity) then you probably do not need Dragonfly as even with the optimisations implemented we won't be higher than 13M qps with pipelining.

romange avatar Dec 11 '25 08:12 romange

Thank you very much. I will continue to research the advantages of dragonfly and choose the appropriate key-value cache for our needs.

WJSGDBZ avatar Dec 11 '25 08:12 WJSGDBZ