RedisShake icon indicating copy to clipboard operation
RedisShake copied to clipboard

How to reduce time to migrate data from rdb file to redis cluster ?

Open dhruvil-alphonso opened this issue 8 months ago • 11 comments

问题描述(Issue Description)

Looking for ways optimize the calls and reduce time it takes to migrate data from rdb file to redis cluster. Right now, it is taking about an hour to ingest a 10GB rdb file into a cluster. I have only tuned the pipeline_count_limit = 4096, and can do it further to some large number, but would like to get suggestions on how i can reduce this to a significantly smaller number. Any help would be helpful. The command I am executing is ./redis-shake shake.toml 请在这里简要描述你遇到的问题。

Please provide a brief description of the issue you encountered. Apart from the reader/writer config, this is the rest of toml file

[advanced] dir = "data" ncpu = 0 # runtime.GOMAXPROCS, 0 means use runtime.NumCPU() cpu cores pprof_port = 0 # pprof port, 0 means disable status_port = 0 # status port, 0 means disable

log_file = "shake.log" log_level = "info" # debug, info or warn log_interval = 5 # in seconds rdb_restore_command_behavior = "panic" pipeline_count_limit = 4096 target_redis_client_max_querybuf_len = 1024_000_000 target_redis_proto_max_bulk_len = 512_000_000 aws_psync = "" [module] target_mbbloom_version = 20603

环境信息(Environment)

  • RedisShake 版本(RedisShake Version):4.0.2
  • Redis 源端版本(Redis Source Version):rdb file
  • Redis 目的端版本(Redis Destination Version):6.x
  • Redis 部署方式(standalone/cluster/sentinel):destination is cluster
  • 是否在云服务商实例上部署(Deployed on Cloud Provider):no

日志信息(Logs)

如果有错误日志或其他相关日志,请在这里提供。

If there are any error logs or other relevant logs, please provide them here.

其他信息(Additional Information)

请提供任何其他相关的信息,如配置文件、错误信息或截图等。

Please provide any additional information, such as configuration files, error messages, or screenshots.

dhruvil-alphonso avatar Dec 05 '23 15:12 dhruvil-alphonso

你是否使用 function?

suxb201 avatar Dec 06 '23 06:12 suxb201

In shake.toml function is function = "" So I guess I am not using this @suxb201

dhruvil-alphonso avatar Dec 07 '23 18:12 dhruvil-alphonso

源端是 standalone 的情况下没有办法进一步提高速度。但一小时的同步时间远超我的测试结果,请提供一下运行日志,可以帮助分析一下性能瓶颈。

suxb201 avatar Dec 08 '23 02:12 suxb201

logs from the last time I ran with pipeline_count_limit as 8196. @suxb201

RedisShake$ ./bin/redis-shake shake.toml 2023-12-07 14:26:22 INF load config from file: shake.toml 2023-12-07 14:26:22 INF log_level: [info], log_file: [/path/shake.log] 2023-12-07 14:26:22 INF changed work dir to [/path/data] 2023-12-07 14:26:22 INF GOMAXPROCS defaults to the value of runtime.NumCPU [32] 2023-12-07 14:26:22 INF not set pprof port 2023-12-07 14:26:22 INF no function script 2023-12-07 14:26:22 INF create RdbReader: /path/dump.rdb 2023-12-07 14:26:22 INF redisClusterWriter connected to redis cluster successful. addresses=[redis cluster nodes] 2023-12-07 14:26:22 INF create RedisClusterWriter: 2023-12-07 14:26:22 INF not set status port 2023-12-07 14:26:22 INF start syncing... 2023-12-07 14:26:22 INF [rdb_reader] start read 2023-12-07 14:26:27 INF read_count=[286461], read_ops=[56226.66], write_count=[286460], write_ops=[56226.66], [rdb_reader] rdb file synced: 0.17% 2023-12-07 14:26:32 INF read_count=[573385], read_ops=[56251.08], write_count=[573384], write_ops=[56251.08], [rdb_reader] rdb file synced: 0.31% 2023-12-07 14:26:37 INF read_count=[852543], read_ops=[56520.40], write_count=[852542], write_ops=[56520.40], [rdb_reader] rdb file synced: 0.47% 2023-12-07 14:26:42 INF read_count=[1131470], read_ops=[55241.30], write_count=[1131469], write_ops=[55241.30], [rdb_reader] rdb file synced: 0.64% 2023-12-07 14:26:47 INF read_count=[1414674], read_ops=[55246.97], write_count=[1414673], write_ops=[55246.97], [rdb_reader] rdb file synced: 0.81% 2023-12-07 14:26:52 INF read_count=[1696260], read_ops=[56492.99], write_count=[1696259], write_ops=[56492.99], [rdb_reader] rdb file synced: 0.97% 2023-12-07 14:26:57 INF read_count=[1980843], read_ops=[55777.04], write_count=[1980842], write_ops=[55777.04], [rdb_reader] rdb file synced: 1.14% 2023-12-07 14:27:02 INF read_count=[2261153], read_ops=[57447.67], write_count=[2261152], write_ops=[57447.67], [rdb_reader] rdb file synced: 1.31% 2023-12-07 14:27:07 INF read_count=[2545045], read_ops=[56282.31], write_count=[2545044], write_ops=[56281.31], [rdb_reader] rdb file synced: 1.48% 2023-12-07 14:27:12 INF read_count=[2829023], read_ops=[57186.09], write_count=[2829022], write_ops=[57186.09], [rdb_reader] rdb file synced: 1.64% 2023-12-07 14:27:17 INF read_count=[3108041], read_ops=[55233.53], write_count=[3108040], write_ops=[55232.53], [rdb_reader] rdb file synced: 1.81% 2023-12-07 14:27:22 INF read_count=[3384648], read_ops=[53898.91], write_count=[3384647], write_ops=[53898.91], [rdb_reader] rdb file synced: 1.98% ..... 2023-12-07 15:14:57 INF read_count=[166563747], read_ops=[58274.82], write_count=[166563746], write_ops=[58273.82], [rdb_reader] rdb file synced: 98.69% 2023-12-07 15:15:02 INF read_count=[166855772], read_ops=[57376.40], write_count=[166855771], write_ops=[57376.40], [rdb_reader] rdb file synced: 98.83% 2023-12-07 15:15:07 INF read_count=[167146004], read_ops=[58237.04], write_count=[167146003], write_ops=[58238.04], [rdb_reader] rdb file synced: 99.00% 2023-12-07 15:15:12 INF read_count=[167441062], read_ops=[60856.30], write_count=[167441061], write_ops=[60856.30], [rdb_reader] rdb file synced: 99.17% 2023-12-07 15:15:17 INF read_count=[167732878], read_ops=[57633.37], write_count=[167732877], write_ops=[57633.37], [rdb_reader] rdb file synced: 99.35% 2023-12-07 15:15:22 INF read_count=[168027157], read_ops=[60803.77], write_count=[168027156], write_ops=[60803.77], [rdb_reader] rdb file synced: 99.52% 2023-12-07 15:15:27 INF read_count=[168324522], read_ops=[61424.57], write_count=[168324521], write_ops=[61423.57], [rdb_reader] rdb file synced: 99.70% 2023-12-07 15:15:32 INF read_count=[168621520], read_ops=[59541.77], write_count=[168621519], write_ops=[59542.77], [rdb_reader] rdb file synced: 99.87% 2023-12-07 15:15:34 INF [rdb_reader] rdb file parse done 2023-12-07 15:15:34 INF all done

dhruvil-alphonso avatar Dec 09 '23 00:12 dhruvil-alphonso

@suxb201 one more thing, I have observed is sometimes when I get errors like below 2023-12-10 14:08:11 ERR [writer_208.85.5.39_7002] redisStandaloneWriter received BUSYKEY reply. It says redisStandaloneWriter and not RedisClusterWriter. Not sure if its relevant tbh, but letting you know anyways

dhruvil-alphonso avatar Dec 10 '23 19:12 dhruvil-alphonso

源端是 standalone 的情况下没有办法进一步提高速度。但一小时的同步时间远超我的测试结果,请提供一下运行日志,可以帮助分析一下性能瓶颈。

@suxb201 那如果源端是redis cluster有办法提高同步速度吗,我有一个64个节点的redis集群,总共16亿个key,350GB内存,我要把它同步到另一个机房的redis集群(节点数80),之前尝试同步过一次,预估耗时十多个小时,就中断了

请问有哪些办法可以帮助我提高同步速度呢,比如分布式并发同步之类的?

谢谢

xishian avatar Dec 12 '23 01:12 xishian

@dhruvil-alphonso 速度是:read_ops=[56226.66],已经很快了,你的 key 很多是吗?

@xishian 是的,现在只有一个 client 在写,只能提供 6 万左右的写能力。我后面打算改成多个 client,并按 key 并发来提高写能力。

suxb201 avatar Dec 12 '23 01:12 suxb201

Yes, I have lots of keys @suxb201, around 144M or so . Looking forward to the multiple clients support. When do you plan to release that ?

dhruvil-alphonso avatar Dec 12 '23 02:12 dhruvil-alphonso

@dhruvil-alphonso 需要很多改动。本周或者下周。

suxb201 avatar Dec 12 '23 03:12 suxb201

@dhruvil-alphonso 我认为我的修改不能提高你的同步速度。需要再进一步确认你的瓶颈在哪里。可能是从盘上读取文件较慢,可能是 dump.rdb 文件解析较慢,也可能是目的端消费命令较慢。

suxb201 avatar Dec 22 '23 08:12 suxb201

@xishian 这个 PR 对于你的场景应该有用:https://github.com/tair-opensource/RedisShake/pull/738

suxb201 avatar Dec 22 '23 09:12 suxb201