sync always hand shaking

Open cataglyphis opened this issue 1 year ago • 1 comments

问题描述（Issue Description）

从 cluster mode 的 redis 同步数据到 AWS ElastiCache。源端是自己部署的 redis

2024-04-12 09:55:37 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:55:42 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking 2024-04-12 09:55:47 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, hand shaking 2024-04-12 09:55:52 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:55:57 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking 2024-04-12 09:56:02 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, hand shaking 2024-04-12 09:56:07 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:56:12 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking 2024-04-12 09:56:17 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, hand shaking 2024-04-12 09:56:22 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:56:27 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking 2024-04-12 09:56:32 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, hand shaking 2024-04-12 09:56:37 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:56:42 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking 2024-04-12 09:56:47 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, hand shaking 2024-04-12 09:56:52 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:56:57 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking 2024-04-12 09:57:02 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, hand shaking 2024-04-12 09:57:07 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking 2024-04-12 09:57:12 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, hand shaking

Please provide a brief description of the issue you encountered.

环境信息（Environment）

RedisShake 版本（RedisShake Version）：4.0.5
Redis 源端版本（Redis Source Version）：6.2.7
Redis 目的端版本（Redis Destination Version）：6.2.6
Redis 部署方式（standalone/cluster/sentinel）：cluster
是否在云服务商实例上部署（Deployed on Cloud Provider）：否

日志信息（Logs）

如果有错误日志或其他相关日志，请在这里提供。

If there are any error logs or other relevant logs, please provide them here.

其他信息（Additional Information）

请提供任何其他相关的信息，如配置文件、错误信息或截图等。

Please provide any additional information, such as configuration files, error messages, or screenshots.

Apr 12 '24 10:04 cataglyphis

我也遇到了这个问题，目前看来是psync后一直没有收到+<reply>，导致一直卡死在循环里，即使source实际已经完成bgsave。复现方法是对一个规格较大的实例启动同步后，快速重启同步（在source没完成bgsave前，理论上在同步前手动触发bgsave也可以复现）。目前是在psync前判断rdb_bgsave_in_progress:0以及psync后加了个计时器30分钟内没收到回包就panic掉重新进行同步来规避这个问题。

https://github.com/tair-opensource/RedisShake/blob/v4/internal/reader/sync_standalone_reader.go#L136

Apr 16 '24 10:04 jijijijijichild