RedisShake icon indicating copy to clipboard operation
RedisShake copied to clipboard

源端为超多分片集群时的同步性能问题

Open AtomTu opened this issue 10 months ago • 5 comments

问题描述(Issue Description)

源端74主74从,目的端3主3从

ncpu设置了64,单个rdb文件4GB,实际测试发现syncing rdb性能非常差,预计要200个小时,请问下性能瓶颈在哪里,还是需要调整什么参数

  • RedisShake 版本(RedisShake Version):4.0.2
  • Redis 源端版本(Redis Source Version):5.0.7
  • Redis 目的端版本(Redis Destination Version):5.0.7
  • Redis 部署方式(standalone/cluster/sentinel):cluster
  • 是否在云服务商实例上部署(Deployed on Cloud Provider):否

日志信息(Logs)

{"level":"info","time":"2023-10-17T01:06:52+08:00","message":"read_count=[1606902], read_ops=[2705.38], write_count=[7887], write_ops=[13.01], src-68, syncing rdb, size=[4.0 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:53+08:00","message":"read_count=[1609415], read_ops=[2513.19], write_count=[7905], write_ops=[18.00], src-69, syncing rdb, size=[4.1 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:54+08:00","message":"read_count=[1611984], read_ops=[2513.19], write_count=[7919], write_ops=[18.00], src-70, syncing rdb, size=[4.8 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:55+08:00","message":"read_count=[1614765], read_ops=[2781.03], write_count=[7936], write_ops=[17.00], src-71, syncing rdb, size=[4.4 MiB/4.2 GiB]"}
{"level":"info","time":"2023-10-17T01:06:56+08:00","message":"read_count=[1617324], read_ops=[2559.25], write_count=[7954], write_ops=[18.00], src-72, syncing rdb, size=[3.6 MiB/4.2 GiB]"}
{"level":"info","time":"2023-10-17T01:06:57+08:00","message":"read_count=[1619926], read_ops=[2601.42], write_count=[7967], write_ops=[13.00], src-73, syncing rdb, size=[4.9 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:58+08:00","message":"read_count=[1622464], read_ops=[2537.93], write_count=[7981], write_ops=[14.00], src-0, syncing rdb, size=[4.2 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:06:59+08:00","message":"read_count=[1625118], read_ops=[2654.25], write_count=[7990], write_ops=[9.00], src-1, syncing rdb, size=[5.8 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:00+08:00","message":"read_count=[1627592], read_ops=[2473.63], write_count=[7995], write_ops=[5.00], src-2, syncing rdb, size=[4.2 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:01+08:00","message":"read_count=[1630554], read_ops=[2962.89], write_count=[8002], write_ops=[7.00], src-3, syncing rdb, size=[4.4 MiB/4.2 GiB]"}
{"level":"info","time":"2023-10-17T01:07:02+08:00","message":"read_count=[1633064], read_ops=[2962.89], write_count=[8015], write_ops=[7.00], src-4, syncing rdb, size=[5.9 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:03+08:00","message":"read_count=[1635635], read_ops=[2571.57], write_count=[8032], write_ops=[17.00], src-5, syncing rdb, size=[4.8 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:04+08:00","message":"read_count=[1638241], read_ops=[2571.57], write_count=[8047], write_ops=[17.00], src-6, syncing rdb, size=[4.3 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:05+08:00","message":"read_count=[1640634], read_ops=[2391.95], write_count=[8060], write_ops=[13.00], src-7, syncing rdb, size=[4.4 MiB/4.1 GiB]"}
{"level":"info","time":"2023-10-17T01:07:06+08:00","message":"read_count=[1643249], read_ops=[2614.97], write_count=[8075], write_ops=[15.00], src-8, syncing rdb, size=[5.9 MiB/4.1 GiB]"}

其他信息(Additional Information)

function = """
local prefix = "mlpSummary:"
local prefix_len = #prefix
if KEYS[1] == nil then
  return
end
if KEYS[1] == "" then
  return
end
if string.sub(KEYS[1], 1, prefix_len) ~= prefix then
  return
end
shake.call(DB, ARGV)
"""

[sync_reader]
cluster = true            # set to true if source is a redis cluster
address = "100.30.6.141:6379" # when cluster is true, set address to one of the cluster node
username = ""              # keep empty if not using ACL
password = ""              # keep empty if no authentication is required
tls = false
sync_rdb = true # set to false if you don't want to sync rdb
sync_aof = true # set to false if you don't want to sync aof

# [scan_reader]
# cluster = true            # set to true if source is a redis cluster
# address = "127.0.0.1:6379" # when cluster is true, set address to one of the cluster node
# username = ""              # keep empty if not using ACL
# password = ""              # keep empty if no authentication is required
# ksn = false                # set to true to enabled Redis keyspace notifications (KSN) subscription
# tls = false

# [rdb_reader]
# filepath = "/tmp/dump.rdb"

[redis_writer]
cluster = true            # set to true if target is a redis cluster
address = "100.30.12.195:6379" # when cluster is true, set address to one of the cluster node
username = ""              # keep empty if not using ACL
password = ""              # keep empty if no authentication is required
tls = false


[advanced]
dir = "data"
ncpu = 128        # runtime.GOMAXPROCS, 0 means use runtime.NumCPU() cpu cores
pprof_port = 6479  # pprof port, 0 means disable
status_port = 6579 # status port, 0 means disable

# log
log_file = "shake.log"
log_level = "info"     # debug, info or warn
log_interval = 1       # in seconds

# redis-shake gets key and value from rdb file, and uses RESTORE command to
# create the key in target redis. Redis RESTORE will return a "Target key name
# is busy" error when key already exists. You can use this configuration item
# to change the default behavior of restore:
# panic:   redis-shake will stop when meet "Target key name is busy" error.
# rewrite: redis-shake will replace the key with new value.
# ignore:  redis-shake will skip restore the key when meet "Target key name is busy" error.
rdb_restore_command_behavior = "ignore" # panic, rewrite or skip

# redis-shake uses pipeline to improve sending performance.
# This item limits the maximum number of commands in a pipeline.
pipeline_count_limit = 40960

# Client query buffers accumulate new commands. They are limited to a fixed
# amount by default. This amount is normally 1gb.
target_redis_client_max_querybuf_len = 1024_000_000

# In the Redis protocol, bulk requests, that are, elements representing single
# strings, are normally limited to 512 mb.
target_redis_proto_max_bulk_len = 512_000_000

# If the source is Elasticache or MemoryDB, you can set this item.
aws_psync = "" # example: aws_psync = "10.0.0.1:6379@nmfu2sl5osync,10.0.0.1:6379@xhma21xfkssync"

[module]
# The data format for BF.LOADCHUNK is not compatible in different versions. v2.6.3 <=> 20603
#target_mbbloom_version = 20603

AtomTu avatar Oct 17 '23 01:10 AtomTu

我测试了,加上lua脚本过滤前缀和不加lua脚本,性能相差几十倍,有什么优化的方案吗

AtomTu avatar Oct 17 '23 03:10 AtomTu

这个需要优化下源端分片数过多时的性能,现在是一个协程在做这些事,有些慢。 你想临时解决下可以开 74 个 shake,每个 shake 的只负责同步源端的一个分片。

suxb201 avatar Oct 17 '23 03:10 suxb201

我测试了,加上lua脚本过滤前缀和不加lua脚本,性能相差几十倍,有什么优化的方案吗

请问,这个有什么优化建议吗

AtomTu avatar Oct 17 '23 05:10 AtomTu

没有,就是慢。 后面可能会优化下调用 lua 的方式。

suxb201 avatar Oct 17 '23 07:10 suxb201

这里可以看下最新的 PR #753 ,Lua 相关代码的性能提升了数倍

Zheaoli avatar Jan 02 '24 12:01 Zheaoli