KeyDB [BUG] result benchmark cluster keydb is not as expect

I have standalone keydb with result 400k ops/sec ( memtier_benchmark tools). Then i installed keydb cluster with 3 master on 3 VMs, but my OPS/sec is about 600k ops/sec -- it is not results that i want, i expected it about 400k*3 = 1200k OPS. ( Hardware resource same on keydb standalone and nodes cluster keydb). Untitled

This is my keydb.conf

protected-mode yes

port 7000

tcp-backlog 65000

timeout 0

tcp-keepalive 100

daemonize yes

supervised no

pidfile /var/run/keydb/keydb-server-7000.pid

loglevel notice

logfile /var/log/keydb/keydb-server-7000.log

databases 16

always-show-logo yes

set-proc-title yes

proc-title-template "{title} {listen-addr} {server-mode}"

save 900 1 save 300 10 save 60 10000

stop-writes-on-bgsave-error yes

rdbcompression yes

rdbchecksum yes

dbfilename dump.rdb

rdb-del-sync-files no

dir /var/lib/keydb-7000

replica-serve-stale-data yes

replica-read-only yes

repl-diskless-sync no

repl-diskless-sync-delay 5

repl-diskless-load disabled

repl-disable-tcp-nodelay no

replica-priority 100

acllog-max-len 128

maxmemory 5GB

maxmemory-policy allkeys-lru

lazyfree-lazy-eviction no lazyfree-lazy-expire no lazyfree-lazy-server-del no replica-lazy-flush no

lazyfree-lazy-user-del no

lazyfree-lazy-user-flush no

oom-score-adj no

oom-score-adj-values 0 200 800

disable-thp yes

appendonly yes

appendfilename "appendonly-7000.aof"

appendfsync everysec

no-appendfsync-on-rewrite no

auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb

aof-load-truncated yes

aof-use-rdb-preamble yes

lua-time-limit 5000

cluster-enabled yes

cluster-config-file /etc/keydb/cluster/nodes-6379-7000.conf

slowlog-log-slower-than 10000

slowlog-max-len 128

latency-monitor-threshold 0

notify-keyspace-events ""

hash-max-ziplist-entries 512 hash-max-ziplist-value 64

list-max-ziplist-size -2

list-compress-depth 0

set-max-intset-entries 512

zset-max-ziplist-entries 128 zset-max-ziplist-value 64

hll-sparse-max-bytes 3000

stream-node-max-bytes 4096 stream-node-max-entries 100

activerehashing yes

client-output-buffer-limit normal 0 0 0 client-output-buffer-limit replica 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60

hz 10

dynamic-hz yes

aof-rewrite-incremental-fsync yes

rdb-save-incremental-fsync yes

jemalloc-bg-thread yes

multi-master-no-forward no

server-threads 16

server-thread-affinity true

active-replica yes

replica-weighting-factor 1 cluster-node-timeout 5000 min-clients-per-thread 50

PLS help me some keywork to reslove my issue.

Oct 17 '22 16:10 mickey-bob

PLS, someone give me a keyword

Oct 28 '22 01:10 mickey-bob

You got about a 30% increase moving to a cluster. That's great!

Moving from a single node to cluster will not proportionately improve the throughput by the number of VM's - but rather the throughput will if by increasing the number of clusters- example: if you now move to two or three clusters, the throughput WILL double or triple, respectively.

Oct 30 '22 15:10 DreadfulCode

Instead of trying a cluster- what if you move back to a single node, and enable active-replication with the three nodes: the total throughput of the system will increase threefold, although the throughput will still only be measurable per node.

Oct 30 '22 15:10 DreadfulCode

You can improve the speed of your system in question by enabling huge pages, turning off AOF and RDB backups, and setting the loglevel to "warning". Also pay attention to server-threads 16 - you get diminishing returns on performace if set too high.

The formula I use to help estimate the optimal # of server-threads is:

floor(num of cores / 3) + 1

4 cores = 2 8 cores = 3 20 cores = 7

server-threads (probably less than 16) disable-thp no savedb "" appendonly no loglevel warning

Oct 30 '22 15:10 DreadfulCode

You can improve the speed of your system in question by enabling huge pages, turning off AOF and RDB backups, and setting the loglevel to "warning". Also pay attention to server-threads 16 - you get diminishing returns on performace if set too high.

The formula I use to help estimate the optimal # of server-threads is:

floor(num of cores / 3) + 1

4 cores = 2 8 cores = 3 20 cores = 7

server-threads (probably less than 16) disable-thp no savedb "" appendonly no loglevel warning

Thank you @DreadfulCode , i will try to follow your recommendation.

Oct 30 '22 17:10 mickey-bob