replication-manager mysql gtid #429 issue test

Hi!

https://github.com/signal18/replication-manager/issues/429

I tested the problem.

SRM Versions below v.2.2.2 had no problems. The problem occurred after version v.2.2.3 There seems to be a problem between versions.

https://github.com/signal18/replication-manager/compare/v2.2.2...v2.2.3

Please confirm on this issue.

Jun 29 '22 07:06 nyxneuf

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L669-L678

/*
	ss, errss := server.GetSlaveStatus(server.ReplicationSourceName)
	if errss != nil {
		server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", errss)
		return false
	}
	server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)
*/

https://github.com/signal18/replication-manager/commit/7465c74b2777993db51be47b143d1bcc41c5c672

Uncommenting the above source code works fine.

Jun 30 '22 05:06 nyxneuf

SRM v.2.2.22 file name : srv_rejoin.go

-- Add code --

            ss, err := server.GetSlaveStatus(server.ReplicationSourceName)
            if err != nil {
                    server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", err)
                    return false
            }

            server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)

Compile after adding source code.

MySQL GTID rejoin old master after failover is working!

MySQL 8 M-S-S Test log

-- Log --

time="2022-06-30 14:48:03" level=info msg="Election matrice maxpos>0: [\n\t{\n\t\t"URL": "192.168.50.13:33307",\n\t\t"Indice": 0,\n\t\t"Pos": 24000000000782,\n\t\t"Seq": 0,\n\t\t"Prefered": false,\n\t\t"Ignoredconf": true,\n\t\t"Ignoredrelay": false,\n\t\t"Ignoredmultimaster": false,\n\t\t"Ignoredreplication": true,\n\t\t"Weight": 0\n\t}\n] " cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="State changed, init failed server 192.168.50.11:33307 as unconnected" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Setting Read Only on unconnected server 192.168.50.11:33307 as active monitor and other master is discovered" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Server 192.168.50.11:33307 state transition Failed changed to: StandAlone" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Backup ahead binlog events of previously failed server 192.168.50.11:33307" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Backup /app/mysql/mysql8/bin/mysqlbinlog /app/mysql/mysql8/bin/mysqlbinlog --read-from-remote-server --raw --stop-never-slave-server-id=10000 --user=XXXX --password=XXXX --host=192.168.50.11 --port=33307 --result-file=/app/mrm/data/cluster_mysql33307-server801- --start-position=775 mysql-bin.000021" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Rejoin master incremental 192.168.50.11:33307" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Crash info &{192.168.50.11:33307 mysql-bin.000021 775 mysql-bin.000024 627 %!s(bool=true) %!s(*gtid.List=&[{0 9766354229977087711 1} {0 2225286656223542512 25} {0 4163419948185935994 22}]) 192.168.50.12:33307}" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Rejoined GTID sequence 0" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Crash Saved GTID sequence 0 for master id 801" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Failed to check if server was using GTID Empty replications channels" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=error msg="Failed to get extra bin log events server 192.168.50.11:33307, scannable dest type slice with >1 columns (6) in result " cluster=cluster_mysql33307 error="scannable dest type slice with >1 columns (6) in result" module=Rejoin server="192.168.50.11:33307" sql="SHOW BINLOG EVENTS IN 'mysql-bin.000021' FROM 775" time="2022-06-30 14:48:05" level=info msg="SHOW BINLOG EVENTS IN 'mysql-bin.000021' FROM 775" cluster=cluster_mysql33307 module=Rejoin server="192.168.50.11:33307" time="2022-06-30 14:48:05" level=info msg="Found same or lower GTID 0-11109204415399535128-1,0-13954141176147374816-25,0-16540146682189317677-22 and new elected master was 0-9766354229977087711-1,0-2225286656223542512-25,0-4163419948185935994-22" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Doing MySQL GTID switch of the old master" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="CHANGE MASTER TO master_host='192.168.50.12', master_port=33307, master_user='XXX', master_password='XXX', master_connect_retry=5, master_heartbeat_period=3, MASTER_AUTO_POSITION=1" cluster=cluster_mysql33307 module=Rejoin server="192.168.50.11:33307" time="2022-06-30 14:48:05" level=info msg="Rejoin old Master 192.168.50.11:33307 , backing up lost event to /app/mrm/data/cluster_mysql33307/crash-bin-20220630144805" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Server 192.168.50.11:33307 previous state changed to: StandAlone" cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=info msg="Election matrice maxpos>0: [\n\t{\n\t\t"URL": "192.168.50.13:33307",\n\t\t"Indice": 0,\n\t\t"Pos": 24000000000782,\n\t\t"Seq": 0,\n\t\t"Prefered": false,\n\t\t"Ignoredconf": true,\n\t\t"Ignoredrelay": false,\n\t\t"Ignoredmultimaster": false,\n\t\t"Ignoredreplication": true,\n\t\t"Weight": 0\n\t}\n] " cluster=cluster_mysql33307 time="2022-06-30 14:48:05" level=warning msg="Rejoining standalone server 192.168.50.11:33307 to master 192.168.50.12:33307" cluster=cluster_mysql33307 code=WARN0022 status=OPENED type=state time="2022-06-30 14:48:07" level=info msg="Server 192.168.50.11:33307 state transition StandAlone changed to: Slave" cluster=cluster_mysql33307 time="2022-06-30 14:48:07" level=info msg="Server 192.168.50.11:33307 previous state changed to: Slave" cluster=cluster_mysql33307 time="2022-06-30 14:48:07" level=warning msg="No candidates found in slaves list" cluster=cluster_mysql33307 code=ERR00032 status=RESOLV type=state

If you don't add the above source code it won't work. Is it correct to modify the above source code? I do not know. Confirmation is required.

Jun 30 '22 06:06 nyxneuf

@nyxneuf indeed, this code might have changed something. because the server is not using GTID then there is a return. If the code was commented we do not hit this return, and continue. @svaroqui can you confirm?

Jun 30 '22 10:06 tanji

I'm on it but if could reproduce it would be easier :)

Jun 30 '22 10:06 svaroqui

I got a patch that correct the issue fixing rejoin gtid GTID compared to crash GTID i will commit when i 'll get a fix for rejoin via SST direct dump in case GTID are upper

Jul 01 '22 17:07 svaroqui

replication-manager replication-manager copied to clipboard

mysql gtid #429 issue test

replication-manager
replication-manager copied to clipboard