replication-manager
replication-manager copied to clipboard
mysql gtid #429 issue test
Hi!
https://github.com/signal18/replication-manager/issues/429
I tested the problem.
SRM Versions below v.2.2.2 had no problems. The problem occurred after version v.2.2.3 There seems to be a problem between versions.
https://github.com/signal18/replication-manager/compare/v2.2.2...v2.2.3
Please confirm on this issue.
https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L669-L678
/*
ss, errss := server.GetSlaveStatus(server.ReplicationSourceName)
if errss != nil {
server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", errss)
return false
}
server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)
*/
https://github.com/signal18/replication-manager/commit/7465c74b2777993db51be47b143d1bcc41c5c672
Uncommenting the above source code works fine.
SRM v.2.2.22 file name : srv_rejoin.go
-- Add code --
ss, err := server.GetSlaveStatus(server.ReplicationSourceName)
if err != nil {
server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", err)
return false
}
server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)
Compile after adding source code.
MySQL GTID rejoin old master after failover is working!
MySQL 8 M-S-S Test log
-- Log --
time="2022-06-30 14:48:03" level=info msg="Election matrice maxpos>0: [\n\t{\n\t\t"URL": "192.168.50.13:33307",\n\t\t"Indice": 0,\n\t\t"Pos": 24000000000782,\n\t\t"Seq": 0,\n\t\t"Prefered": false,\n\t\t"Ignoredconf": true,\n\t\t"Ignoredrelay": false,\n\t\t"Ignoredmultimaster": false,\n\t\t"Ignoredreplication": true,\n\t\t"Weight": 0\n\t}\n] " cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="State changed, init failed server 192.168.50.11:33307 as unconnected" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Setting Read Only on unconnected server 192.168.50.11:33307 as active monitor and other master is discovered" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Server 192.168.50.11:33307 state transition Failed changed to: StandAlone" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Backup ahead binlog events of previously failed server 192.168.50.11:33307" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Backup /app/mysql/mysql8/bin/mysqlbinlog /app/mysql/mysql8/bin/mysqlbinlog --read-from-remote-server --raw --stop-never-slave-server-id=10000 --user=XXXX --password=XXXX --host=192.168.50.11 --port=33307 --result-file=/app/mrm/data/cluster_mysql33307-server801- --start-position=775 mysql-bin.000021" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Rejoin master incremental 192.168.50.11:33307" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Crash info &{192.168.50.11:33307 mysql-bin.000021 775 mysql-bin.000024 627 %!s(bool=true) %!s(*gtid.List=&[{0 9766354229977087711 1} {0 2225286656223542512 25} {0 4163419948185935994 22}]) 192.168.50.12:33307}" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Rejoined GTID sequence 0" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Crash Saved GTID sequence 0 for master id 801" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Failed to check if server was using GTID Empty replications channels" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=error msg="Failed to get extra bin log events server 192.168.50.11:33307, scannable dest type slice with >1 columns (6) in result " cluster=cluster_mysql33307 error="scannable dest type slice with >1 columns (6) in result" module=Rejoin server="192.168.50.11:33307" sql="SHOW BINLOG EVENTS IN 'mysql-bin.000021' FROM 775" time="2022-06-30 14:48:05" level=info msg="SHOW BINLOG EVENTS IN 'mysql-bin.000021' FROM 775" cluster=cluster_mysql33307 module=Rejoin server="192.168.50.11:33307" time="2022-06-30 14:48:05" level=info msg="Found same or lower GTID 0-11109204415399535128-1,0-13954141176147374816-25,0-16540146682189317677-22 and new elected master was 0-9766354229977087711-1,0-2225286656223542512-25,0-4163419948185935994-22" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Doing MySQL GTID switch of the old master" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="CHANGE MASTER TO master_host='192.168.50.12', master_port=33307, master_user='XXX', master_password='XXX', master_connect_retry=5, master_heartbeat_period=3, MASTER_AUTO_POSITION=1" cluster=cluster_mysql33307 module=Rejoin server="192.168.50.11:33307" time="2022-06-30 14:48:05" level=info msg="Rejoin old Master 192.168.50.11:33307 , backing up lost event to /app/mrm/data/cluster_mysql33307/crash-bin-20220630144805" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Server 192.168.50.11:33307 previous state changed to: StandAlone" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Election matrice maxpos>0: [\n\t{\n\t\t"URL": "192.168.50.13:33307",\n\t\t"Indice": 0,\n\t\t"Pos": 24000000000782,\n\t\t"Seq": 0,\n\t\t"Prefered": false,\n\t\t"Ignoredconf": true,\n\t\t"Ignoredrelay": false,\n\t\t"Ignoredmultimaster": false,\n\t\t"Ignoredreplication": true,\n\t\t"Weight": 0\n\t}\n] " cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=warning msg="Rejoining standalone server 192.168.50.11:33307 to master 192.168.50.12:33307" cluster=cluster_mysql33307 code=WARN0022 status=OPENED type=state time="2022-06-30 14:48:07" level=info msg="Server 192.168.50.11:33307 state transition StandAlone changed to: Slave" cluster=cluster_mysql33307 time="2022-06-30 14:48:07" level=info msg="Server 192.168.50.11:33307 previous state changed to: Slave" cluster=cluster_mysql33307 time="2022-06-30 14:48:07" level=warning msg="No candidates found in slaves list" cluster=cluster_mysql33307 code=ERR00032 status=RESOLV type=state
If you don't add the above source code it won't work. Is it correct to modify the above source code? I do not know. Confirmation is required.
@nyxneuf indeed, this code might have changed something. because the server is not using GTID then there is a return. If the code was commented we do not hit this return, and continue. @svaroqui can you confirm?
I'm on it but if could reproduce it would be easier :)
I got a patch that correct the issue fixing rejoin gtid GTID compared to crash GTID i will commit when i 'll get a fix for rejoin via SST direct dump in case GTID are upper