Tendis icon indicating copy to clipboard operation
Tendis copied to clipboard

cluster failover失败,提示replication unset master fail on node-ERR:16,msg:Lock wait timeout

Open caoyutingtjpu opened this issue 4 years ago • 3 comments

Description

一组三主三从的cluster集群,集群单个分片存储数据约为20G。在从库执行cluster failover [force|takeover],主动故障转移会失败。 1)调大lockdbxwaittimeout参数,比如500,会切换成功,这个参数是跟数据存储量有关吗? 2)如果数据不到G级别,默认参数可以切换成功 3)即使调大lockdbxwaittimeout参数,在主从没有offset延迟情况下,整个切换时间是达到5min

Expected Behavior

主动故障转移快速切换成功

Current Behavior

主动故障转移失败,提示Manual failover timed out

Possible Solution

增大lockdbxwaittimeout该参数,会切换成功

Steps to Reproduce (for bugs)

Context

Your Environment

  • Operating System and version:CentOS Linux release 7.3.1611 (Core)
  • Machine Specifications: 16核CPU、64G内存、3.6T SSD
  • Tendis Version: 2.3.4
  • Tendis Configuration: lockdbxwaittimeout 180;lockwaittimeout 3600
  • IO/Network used:
  • Link to your project:

caoyutingtjpu avatar Jun 16 '21 02:06 caoyutingtjpu

日志如下 E0615 18:40:12.742552 22832 cluster_manager.cpp:1530] Manual failover user request accepted. E0615 18:40:12.805608 22857 cluster_manager.cpp:5005] Received replication offset for paused master manual failover: 57021795 57021795 E0615 18:40:12.833029 22858 cluster_manager.cpp:2194] All master replication stream processed, manual failover can start. E0615 18:40:12.833102 22858 cluster_manager.cpp:2787] Start of election delayed for 0 milliseconds (rank #0, offset 57021795). E0615 18:40:12.933202 22858 cluster_manager.cpp:2174] mf_can_start is non-zero E0615 18:40:12.933250 22858 cluster_manager.cpp:2840] Starting a failover election for epoch 14. E0615 18:40:13.033845 22858 cluster_manager.cpp:2174] mf_can_start is non-zero E0615 18:40:13.033880 22858 cluster_manager.cpp:2855] Failover election won: I'm the new master. E0615 18:40:13.033890 22858 cluster_manager.cpp:2859] configEpoch set to 14 after successful failover E0615 18:43:13.034068 22858 cluster_manager.cpp:2655] replication unset master fail on node-ERR:16,msg:Lock wait timeout E0615 18:43:13.034097 22858 cluster_manager.cpp:4302] replace master fail-ERR:16,msg:Lock wait timeout E0615 18:43:13.037346 22833 server_entry.cpp:325] sessid:4085335 cmd:binlog_heartbeat 0 1623753614002, error:-ERR:6,msg:sessionId not match E0615 18:43:13.039944 22832 server_entry.cpp:325] sessid:4085338 cmd:binlog_heartbeat 4 1623753614002, error:-ERR:6,msg:sessionId not match E0615 18:43:13.040509 22830 server_entry.cpp:325] sessid:4085340 cmd:binlog_heartbeat 7 1623753614002, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041263 22831 server_entry.cpp:325] sessid:4085342 cmd:binlog_heartbeat 8 1623753614002, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041405 22833 repl.cpp:494] applyRepllog failed,mode:0 err: E0615 18:43:13.041426 22831 repl.cpp:494] applyRepllog failed,mode:0 err: E0615 18:43:13.041512 22831 server_entry.cpp:325] sessid:4085334 cmd:applybinlogsv2 1 [78856] 180 0, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041530 22833 server_entry.cpp:325] sessid:4085343 cmd:applybinlogsv2 9 [106521] 235 0, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041640 22833 repl.cpp:494] applyRepllog failed,mode:0 err: E0615 18:43:13.041667 22833 server_entry.cpp:325] sessid:4085341 cmd:applybinlogsv2 6 [94700] 212 0, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041685 22831 repl.cpp:494] applyRepllog failed,mode:0 err: E0615 18:43:13.041800 22831 server_entry.cpp:325] sessid:4085336 cmd:applybinlogsv2 3 [108351] 241 0, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041828 22833 repl.cpp:494] applyRepllog failed,mode:0 err: E0615 18:43:13.041867 22833 server_entry.cpp:325] sessid:4085337 cmd:applybinlogsv2 2 [116055] 256 0, error:-ERR:6,msg:sessionId not match E0615 18:43:13.041966 22830 repl.cpp:494] applyRepllog failed,mode:0 err: E0615 18:43:13.042052 22830 server_entry.cpp:325] sessid:4085339 cmd:applybinlogsv2 5 [115562] 256 0, error:-ERR:6,msg:sessionId not match E0615 18:43:13.134183 22858 cluster_manager.cpp:2127] Manual failover timed out.

caoyutingtjpu avatar Jun 16 '21 02:06 caoyutingtjpu

你好

默认lockdbxwaittimeout为1,如果这个提示超时,表示slave在提升为主的时候,正在apply的binlog无法在1s内执行成功,因此,锁超时。当这个参数调整为500后,可以成功,说明apply时间足够了。

另外,这个超时可能说明这个时候slave apply一个比较大的binlog,或者slave响应很慢。

请提供一下你当前的配置参数,可以执行config get *获得以及info all 另外,提供一下,调整lockdbxwaittimeout后,切换成功slave的相关日志。

此时,master执行了一些什么操作,重点关注

  1. 是否存在大key
  2. 是否存在时间复杂度O(N)的操作,详看 http://tendis.cn/#/Tendisplus/%E6%95%B4%E4%BD%93%E4%BB%8B%E7%BB%8D/redis%E5%85%BC%E5%AE%B9%E6%80%A7

TendisDev avatar Jun 16 '21 02:06 TendisDev

info all信息

# Server
redis_version:2.3.4-rocksdb-v5.13.4
redis_git_sha1:552a4365
redis_git_dirty:23
redis_build_id:4869811118804139172
redis_mode:cluster
TENDIS_DEBUG:OFF
os:Linux 3.10.0-514.21.1.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:asio
gcc_version:5:5:0
process_id:16293
tcp_port:10005
uptime_in_seconds:582532
uptime_in_days:6
config_file:/home/deploy/tendis/tendis_10005/tendis.conf

# Clients
connected_clients:23

# Memory
used_memory:-1
used_memory_human:-1
used_memory_rss:16907182080
used_memory_rss_human:16510920kB
used_memory_peak:-1
used_memory_peak_human:-1
total_system_memory:-1
total_system_memory_human:-1
used_memory_lua:-1
used_memory_vir:28040253440
used_memory_vir_human:27383060kB
used_memory_vir_peak_human:27383060kB
used_memory_rss_peak_human:17585584kB

# Persistence
loading:-1
rdb_changes_since_last_save:-1
rdb_bgsave_in_progress:-1
rdb_last_save_time:-1
rdb_last_bgsave_status:-1
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:-1
aof_rewrite_in_progress:-1
aof_rewrite_scheduled:-1
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:-1
aof_last_write_status:-1

# Stats
total_connections_received:11677
total_connections_released:11666
total_commands_processed:81234099
instantaneous_ops_per_sec:9
total_commands_cost(ns):10819726677725
total_commands_workpool_queue_cost(ns):1334726646753
total_commands_workpool_execute_cost(ns):7684238724562
total_commands_send_packet_cost(ns):1800761306410
total_commands_execute_cost(ns):7189228875731
avg_commands_cost(ns):133191
avg_commands_workpool_queue_cost(ns):16430
avg_commands_workpool_execute_cost(ns):94593
avg_commands_send_packet_cost(ns):22167
avg_commands_execute_cost(ns):88500
commands_in_queue:1
commands_executed_in_workpool:164361732
total_stricky_packets:11259684
total_invalid_packets:0
total_net_input_bytes:31662599202
total_net_output_bytes:914338978
instantaneous_input_kbps:0.493164
instantaneous_output_kbps:0.705078
rejected_connections:0
sync_full:10
sync_partial_ok:120
sync_partial_err:0
keyspace_hits:32054000
keyspace_misses:28001674
keyspace_wrong_versionep:0
scheduleNum:11677

# Replication
role:slave
master_host:xxxxxxxx
master_port:xxxxxxxx
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:57265181
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:57265181
rocksdb0_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=0,state=online,fullsync_succ_times=1,binlog_pos=5719621,lag=0
rocksdb1_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=1,state=online,fullsync_succ_times=1,binlog_pos=5725145,lag=0
rocksdb2_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=2,state=online,fullsync_succ_times=1,binlog_pos=5745738,lag=0
rocksdb3_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=3,state=online,fullsync_succ_times=1,binlog_pos=5724495,lag=0
rocksdb4_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=4,state=online,fullsync_succ_times=1,binlog_pos=5733199,lag=0
rocksdb5_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=5,state=online,fullsync_succ_times=1,binlog_pos=5720659,lag=0
rocksdb6_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=6,state=online,fullsync_succ_times=1,binlog_pos=5729625,lag=0
rocksdb7_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=7,state=online,fullsync_succ_times=1,binlog_pos=5720845,lag=0
rocksdb8_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=8,state=online,fullsync_succ_times=1,binlog_pos=5725794,lag=0
rocksdb9_master:ip=xxxxxxxx,port=xxxxxxxx,src_store_id=9,state=online,fullsync_succ_times=1,binlog_pos=5720060,lag=0

# BinlogInfo
rocksdb0:min=5307091,save=5307091,BLWM=5719621,BHWM=5719622,remain=412530
rocksdb1:min=5204131,save=5204131,BLWM=5725145,BHWM=5725146,remain=521014
rocksdb2:min=5365631,save=5365631,BLWM=5745738,BHWM=5745739,remain=380107
rocksdb3:min=5305151,save=5305151,BLWM=5724495,BHWM=5724496,remain=419344
rocksdb4:min=5244361,save=5244361,BLWM=5733199,BHWM=5733200,remain=488838
rocksdb5:min=5366591,save=5366591,BLWM=5720659,BHWM=5720660,remain=354068
rocksdb6:min=5304261,save=5304261,BLWM=5729625,BHWM=5729626,remain=425364
rocksdb7:min=5367701,save=5367701,BLWM=5720845,BHWM=5720846,remain=353144
rocksdb8:min=5244501,save=5244501,BLWM=5725794,BHWM=5725795,remain=481293
rocksdb9:min=5284781,save=5284781,BLWM=5720060,BHWM=5720061,remain=435279

# CPU
used_cpu_sys:6057.07
used_cpu_user:10249.52
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# CommandStats
cmdstat_applybinlogsv2:calls=164,usec=39423,usec_per_call=240.384
cmdstat_binlog_heartbeat:calls=1449,usec=8554,usec_per_call=5.90338
cmdstat_cluster:calls=69,usec=16326,usec_per_call=236.609
cmdstat_command:calls=11,usec=4982,usec_per_call=452.909
cmdstat_config:calls=10750,usec=770497,usec_per_call=71.6741
cmdstat_dbsize:calls=12,usec=352818711,usec_per_call=2.94016e+07
cmdstat_del:calls=21820216,usec=1140344503,usec_per_call=52.2609
cmdstat_exists:calls=2664720,usec=99660952,usec_per_call=37.4002
cmdstat_expire:calls=51690,usec=2554127,usec_per_call=49.4124
cmdstat_get:calls=1,usec=107,usec_per_call=107
cmdstat_hset:calls=37,usec=2897,usec_per_call=78.2973
cmdstat_info:calls=19999,usec=10637849,usec_per_call=531.919
cmdstat_lpush:calls=4317,usec=440760,usec_per_call=102.099
cmdstat_monitor:calls=1,usec=3,usec_per_call=3
cmdstat_pexpire:calls=21500189,usec=1301949446,usec_per_call=60.5553
cmdstat_ping:calls=12482,usec=17740,usec_per_call=1.42125
cmdstat_rpush:calls=12909449,usec=802126020,usec_per_call=62.1348
cmdstat_scan:calls=3,usec=4706,usec_per_call=1568.67
cmdstat_set:calls=20498065,usec=944278313,usec_per_call=46.0667
cmdstat_setex:calls=579524,usec=26125866,usec_per_call=45.0816
cmdstat_setnx:calls=47374,usec=3079128,usec_per_call=64.9962
cmdstat_show:calls=2,usec=16,usec_per_call=8
cmdstat_zadd:calls=1194779,usec=148183959,usec_per_call=124.026
cmdstat_unseen:calls=512,num=2

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=0,expires=0,avg_ttl=0

# Backup
backup-count:0
last-backup-time:0
current-backup-running:no

# Dataset
rocksdb.kvstore-count:10
rocksdb.total-sst-files-size:12320440984
rocksdb.binlogcf-sst-files-size:9078363405
rocksdb.live-sst-files-size:12320440984
rocksdb.estimate-live-data-size:9010630971
rocksdb.estimate-num-keys:40121441
rocksdb.total-memory:13070006404
rocksdb.cur-size-all-mem-tables:342056
rocksdb.estimate-table-readers-mem:185287868
rocksdb.blockcache.capacity:12884901888
rocksdb.blockcache.usage:12884376480
rocksdb.blockcache.pinnedusage:1920
rocksdb.mem-table-flush-pending:0
rocksdb.estimate-pending-compaction-bytes:0
rocksdb.compaction-pending:0
rocksdb.number.iter.skip:400717717
rocksdb.compaction-filter-count:360041943
rocksdb.compaction-kv-expired-count:172118

# Compaction
current-compaction-status:stopped
time-since-lastest-compaction:582532
current-compaction-dbid:

# IndexManager
total_expire_keys:182584
deleting_expire_keys:0
scanner_matrix:inQueue 0,executing 0,executed 582500,queueTime 1340715191966454021ns,executeTime 52149849824846ns
deleter_matrix:inQueue 0,executing 0,executed 25541,queueTime 3037383436988209ns,executeTime 3032003897030ns
scanpoint_0:21-06-15 20:23:42
scanpoint_1:21-06-15 20:23:42
scanpoint_2:21-06-15 20:23:42
scanpoint_3:21-06-15 20:23:44
scanpoint_4:21-06-15 20:23:40
scanpoint_5:21-06-15 20:23:43
scanpoint_6:21-06-15 20:23:43
scanpoint_7:21-06-15 20:23:43
scanpoint_8:21-06-15 20:23:42
scanpoint_9:21-06-15 20:23:34
scanpoint:21-06-15 20:23:34

# Levelstats
rocksdb0.level-0:bytes=13845649,num_entries=141168,num_deletions=129179,num_files=2
rocksdb0.level-5:bytes=135335411,num_entries=228255,num_deletions=4929,num_files=2
rocksdb0.level-6:bytes=975138616,num_entries=3905163,num_deletions=0,num_files=15
rocksdb1.level-0:bytes=18160078,num_entries=148379,num_deletions=129408,num_files=3
rocksdb1.level-5:bytes=135350763,num_entries=229120,num_deletions=4828,num_files=2
rocksdb1.level-6:bytes=975317794,num_entries=3903197,num_deletions=0,num_files=15
rocksdb2.level-0:bytes=26430975,num_entries=163869,num_deletions=130920,num_files=3
rocksdb2.level-5:bytes=202999068,num_entries=341131,num_deletions=7297,num_files=3
rocksdb2.level-6:bytes=909533883,num_entries=3796710,num_deletions=0,num_files=16
rocksdb3.level-0:bytes=17152907,num_entries=147794,num_deletions=130493,num_files=3
rocksdb3.level-5:bytes=493903850,num_entries=831749,num_deletions=8882,num_files=8
rocksdb3.level-6:bytes=807583356,num_entries=3310355,num_deletions=0,num_files=14
rocksdb4.level-0:bytes=21373974,num_entries=154377,num_deletions=129769,num_files=3
rocksdb4.level-5:bytes=497189107,num_entries=838384,num_deletions=6011,num_files=8
rocksdb4.level-6:bytes=803401703,num_entries=3294505,num_deletions=0,num_files=13
rocksdb5.level-0:bytes=15214262,num_entries=144489,num_deletions=130213,num_files=2
rocksdb5.level-5:bytes=422903489,num_entries=715401,num_deletions=8976,num_files=7
rocksdb5.level-6:bytes=833049008,num_entries=3432333,num_deletions=0,num_files=14
rocksdb6.level-0:bytes=19199007,num_entries=148978,num_deletions=128348,num_files=3
rocksdb6.level-5:bytes=493442124,num_entries=830092,num_deletions=8180,num_files=8
rocksdb6.level-6:bytes=989111218,num_entries=3309175,num_deletions=0,num_files=16
rocksdb7.level-0:bytes=14109816,num_entries=142039,num_deletions=129474,num_files=2
rocksdb7.level-5:bytes=202993361,num_entries=342855,num_deletions=4868,num_files=3
rocksdb7.level-6:bytes=906918135,num_entries=3787475,num_deletions=0,num_files=15
rocksdb8.level-0:bytes=17306678,num_entries=146487,num_deletions=128859,num_files=3
rocksdb8.level-5:bytes=412784317,num_entries=692014,num_deletions=8667,num_files=7
rocksdb8.level-6:bytes=835937305,num_entries=3453181,num_deletions=0,num_files=14
rocksdb9.level-0:bytes=14777164,num_entries=143911,num_deletions=130277,num_files=2
rocksdb9.level-5:bytes=135338013,num_entries=229099,num_deletions=4825,num_files=2
rocksdb9.level-6:bytes=974639953,num_entries=3900993,num_deletions=0,num_files=16

# RocksdbBgError
rocksdb_bg_error_count:0

config 信息

  1) "aof-enabled"
  2) "no"
  3) "aof-psync-num"
  4) "500"
  5) "bind"
  6) "\"0.0.0.0\""
  7) "binlog-send-batch"
  8) "256"
  9) "binlog-send-bytes"
 10) "16777216"
 11) "binlog-using-defaultcf"
 12) "no"
 13) "binlogdelrange"
 14) "10"
 15) "binlogfilesecs"
 16) "7200"
 17) "binlogfilesizemb"
 18) "512"
 19) "binlogratelimitmb"
 20) "120"
 21) "checkkeytypeforsetcmd"
 22) "no"
 23) "chunksize"
 24) "16384"
 25) "cluster-enabled"
 26) "yes"
 27) "cluster-migration-barrier"
 28) "1"
 29) "cluster-migration-binlog-iters"
 30) "60"
 31) "cluster-migration-distance"
 32) "10000"
 33) "cluster-migration-rate-limit"
 34) "100"
 35) "cluster-migration-slots-num-per-task"
 36) "100"
 37) "cluster-node-timeout"
 38) "15000"
 39) "cluster-require-full-coverage"
 40) "no"
 41) "cluster-single-node"
 42) "no"
 43) "cluster-slave-no-failover"
 44) "no"
 45) "cluster-slave-validity-factor"
 46) "10"
 47) "compactrange-after-deleterange"
 48) "no"
 49) "daemon"
 50) "yes"
 51) "databases"
 52) "16"
 53) "delcntindexmgr"
 54) "10000"
 55) "deletefilesinrange-for-binlog"
 56) "no"
 57) "deljobcntindexmgr"
 58) "1"
 59) "dir"
 60) "\"/home/deploy/tendis/tendis_10005/db\""
 61) "domain-enabled"
 62) "no"
 63) "dumpdir"
 64) "\"/home/deploy/tendis/tendis_10005/dump\""
 65) "executorthreadnum"
 66) "4"
 67) "executorworkpoolsize"
 68) "4"
 69) "force-recovery"
 70) "0"
 71) "fullpushthreadnum"
 72) "4"
 73) "fullreceivethreadnum"
 74) "4"
 75) "garbage-delete-size"
 76) "30"
 77) "garbagedeletethreadnum"
 78) "1"
 79) "generallog"
 80) "yes"
 81) "incrpushthreadnum"
 82) "4"
 83) "jeprof-auto-dump"
 84) "yes"
 85) "keysdefaultlimit"
 86) "100"
 87) "kvstorecount"
 88) "10"
 89) "lockdbxwaittimeout"
 90) "500"
 91) "lockwaittimeout"
 92) "3600"
 93) "logdir"
 94) "\"/home/deploy/tendis/tendis_10005/log\""
 95) "loglevel"
 96) "\"debug\""
 97) "logrecyclethreadnum"
 98) "4"
 99) "lua-time-limit"
100) "5000"
101) "luastatemaxidletime"
102) "3600000"
103) "masterauth"
104) "\"\""
105) "maxbinlogkeepnum"
106) "100"
107) "maxclients"
108) "10000"
109) "migrate-gc-enabled"
110) "yes"
111) "migrate-snapshot-key-num"
112) "100000"
113) "migrate-snapshot-retry-num"
114) "1000"
115) "migratereceivethreadnum"
116) "4"
117) "migratesenderthreadnum"
118) "4"
119) "minbinlogkeepsec"
120) "604800"
121) "netbatchsize"
122) "1048576"
123) "netbatchtimeoutsec"
124) "60"
125) "netiothreadnum"
126) "2"
127) "noexpire"
128) "no"
129) "pausetimeindexmgr"
130) "10"
131) "pidfile"
132) "\"/home/deploy/tendis/tendis_10005/tmp/tendisplus.pid\""
133) "port"
134) "10005"
135) "proto-max-bulk-len"
136) "536870912"
137) "requirepass"
138) "\"\""
139) "rocks.blockcache_num_shard_bits"
140) "6"
141) "rocks.blockcache_strict_capacity_limit"
142) "no"
143) "rocks.blockcachemb"
144) "12288"
145) "rocks.compress_type"
146) "\"snappy\""
147) "rocks.disable_wal"
148) "no"
149) "rocks.flush_log_at_trx_commit"
150) "no"
151) "rocks.level0_compress_enabled"
152) "no"
153) "rocks.level1_compress_enabled"
154) "no"
155) "rocks.wal_dir"
156) "\"/home/deploy/tendis/tendis_10005/wal\""
157) "save-min-binlogid"
158) "yes"
159) "scancntindexmgr"
160) "1000"
161) "scandefaultlimit"
162) "100"
163) "scandefaultmaxiteratetimes"
164) "1000"
165) "scanjobcntindexmgr"
166) "1"
167) "slave-migrate-enabled"
168) "no"
169) "slavebinlogkeepnum"
170) "1"
171) "slowlog"
172) "\"/home/deploy/tendis/tendis_10005/log\""
173) "slowlog-file-enabled"
174) "yes"
175) "slowlog-flush-interval"
176) "1000"
177) "slowlog-log-slower-than"
178) "10000"
179) "slowlog-max-len"
180) "10240"
181) "storage"
182) "\"rocks\""
183) "timeoutsecbinlogwaitrsp"
184) "60"
185) "truncatebinlogintervalms"
186) "2000"
187) "truncatebinlognum"
188) "20000"
189) "version-increase"
190) "yes"

slave切换成功日志

E0615 20:00:43.207991 22832 cluster_manager.cpp:1523] Forced failover user request accepted. E0615 20:00:43.239794 22858 cluster_manager.cpp:2174] mf_can_start is non-zero E0615 20:00:43.239850 22858 cluster_manager.cpp:2787] Start of election delayed for 0 milliseconds (rank #0, offset 57069173). E0615 20:00:43.339946 22858 cluster_manager.cpp:2174] mf_can_start is non-zero E0615 20:00:43.339990 22858 cluster_manager.cpp:2840] Starting a failover election for epoch 20. E0615 20:00:43.440798 22858 cluster_manager.cpp:2174] mf_can_start is non-zero E0615 20:00:43.440837 22858 cluster_manager.cpp:2855] Failover election won: I'm the new master. E0615 20:00:43.440848 22858 cluster_manager.cpp:2859] configEpoch set to 20 after successful failover E0615 20:05:36.614210 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:0 node:xxxxxxxx state:send_bulk_success binlogPos:5700022 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614284 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:1 node:xxxxxxxx state:send_bulk_success binlogPos:5705346 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614292 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:2 node:xxxxxxxx state:send_bulk_success binlogPos:5725777 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614300 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:3 node:xxxxxxxx state:send_bulk_success binlogPos:5704664 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614306 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:4 node:xxxxxxxx state:send_bulk_success binlogPos:5713705 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614312 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:5 node:xxxxxxxx state:send_bulk_success binlogPos:5700703 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614320 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:6 node:xxxxxxxx state:send_bulk_success binlogPos:5710146 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614326 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:7 node:xxxxxxxx state:send_bulk_success binlogPos:5700914 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614332 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:8 node:xxxxxxxx state:send_bulk_success binlogPos:5706131 startTime:2971653275 endTime:2971653275 E0615 20:05:36.614339 22855 repl_manager.cpp:593] timeout, _fullPushStatus erase,storeId:9 node:xxxxxxxx state:send_bulk_success binlogPos:5700309 startTime:2971653275 endTime:2971653275 E0615 20:30:27.380627 22857 cluster_manager.cpp:2502] Failover auth granted to aa9eae11adefd68847b12d4bd0da89d752019198 for epoch 22

master操作

1、没有大key,测试数据以string类型为主,key数目较多 2、仅测试故障转移,主要操作是ping,cluster nodes,info replication 3、没有写操作,故障转移时,主从offset 是一致的

caoyutingtjpu avatar Jun 16 '21 02:06 caoyutingtjpu