replication-manager
replication-manager copied to clipboard
can't resolve the read_clstername though consul dns when master failover
Hi,I got a new problem!
step 1: do nothing,and everything is OK!
[root]# date&&nslookup write_mysql57.service.consul Thu Sep 13 19:05:52 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
Name: write_mysql57.service.consul Address: 172.17.11.242
[root]# date&&nslookup read_mysql57.service.consul Thu Sep 13 19:05:54 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
Name: read_mysql57.service.consul Address: 172.17.5.201 Name: read_mysql57.service.consul Address: 172.17.5.101
step 2: I kill the master node “write_mysql57.service.consul”,then the problem is coming,it can't resolve “read_mysql57.service.consul”
[root]# date&&nslookup write_mysql57.service.consul Thu Sep 13 19:16:48 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
Name: write_mysql57.service.consul Address: 172.17.5.201
[root]# date&&nslookup read_mysql57.service.consul Thu Sep 13 19:16:50 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
** server can't find read_mysql57.service.consul: NXDOMAIN
step3:start the old master "172.17.11.242" and rejoin to the replication topology.
[root]# date&&nslookup read_mysql57.service.consul Thu Sep 13 19:28:00 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
Name: read_mysql57.service.consul Address: 172.17.5.101
it can only resolve “172.17.5.101”,but for a while all becomes OK!
[root]# date&&nslookup read_mysql57.service.consul Thu Sep 13 19:29:01 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
Name: read_mysql57.service.consul Address: 172.17.5.101 Name: read_mysql57.service.consul Address: 172.17.11.242
replication-manager log: INFO[2018-09-13T19:14:57+08:00] Master Failure detected! Retry 1/5 cluster=mysql57 WARN[2018-09-13T19:14:57+08:00] Server 172.17.11.242:3307 state changed from Master to Suspect cluster=mysql57 type=alert INFO[2018-09-13T19:14:57+08:00] Register consul master ID write_mysql57 with host 172.17.11.242:3307 cluster=mysql57 INFO[2018-09-13T19:14:57+08:00] Ignore consul read service 8994015015945226213 172.17.11.242:3307%!(EXTRA bool=false) cluster=mysql57 INFO[2018-09-13T19:14:57+08:00] Register consul read service 13584063653535782636 172.17.5.101:3307 cluster=mysql57 INFO[2018-09-13T19:14:57+08:00] Register consul read service 4728595097024489897 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-13T19:14:57+08:00] No GTID strict mode on master 172.17.11.242:3307 cluster=mysql57 code=WARN0070 status=RESOLV type=state WARN[2018-09-13T19:14:57+08:00] Master is unreachable but slaves are replicating cluster=mysql57 code=ERR00016 status=OPENED type=state INFO[2018-09-13T19:14:59+08:00] Master Failure detected! Retry 2/5 cluster=mysql57 INFO[2018-09-13T19:15:01+08:00] Master Failure detected! Retry 3/5 cluster=mysql57 INFO[2018-09-13T19:15:03+08:00] Master Failure detected! Retry 4/5 cluster=mysql57 INFO[2018-09-13T19:15:05+08:00] Master Failure detected! Retry 5/5 cluster=mysql57 INFO[2018-09-13T19:15:05+08:00] Declaring master as failed cluster=mysql57 WARN[2018-09-13T19:15:05+08:00] Server 172.17.11.242:3307 state changed from Suspect to Failed cluster=mysql57 type=alert INFO[2018-09-13T19:15:05+08:00] Register consul master ID write_mysql57 with host 172.17.11.242:3307 cluster=mysql57 INFO[2018-09-13T19:15:05+08:00] Register consul read service 13584063653535782636 172.17.5.101:3307 cluster=mysql57 INFO[2018-09-13T19:15:05+08:00] Register consul read service 4728595097024489897 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] ------------------------ cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Starting master failover cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] ------------------------ cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Electing a new master cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Election matrice: [ { "URL": "172.17.5.101:3307", "Indice": 0, "Pos": 0, "Seq": 0, "Prefered": false, "Ignoredconf": false, "Ignoredrelay": false, "Ignoredmultimaster": false, "Ignoredreplication": true, "Weight": 0 }, { "URL": "172.17.5.201:3307", "Indice": 1, "Pos": 2477453870, "Seq": 0, "Prefered": false, "Ignoredconf": false, "Ignoredrelay": false, "Ignoredmultimaster": false, "Ignoredreplication": false, "Weight": 0 } ] cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Slave 172.17.5.201:3307 has been elected as a new master cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Waiting for candidate master to apply relay log cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Reading all relay logs on 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Stopping slave thread on new master cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Failover Proxy Type: proxysql Host: 172.17.5.12 Port: 6032 cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Register consul master ID write_mysql57 with host 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Ignore consul read service 13584063653535782636 172.17.5.101:3307%!(EXTRA bool=false) cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Ignore consul read service 4728595097024489897 172.17.5.201:3307%!(EXTRA bool=true) cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Resetting slave on new master and set read/write mode on cluster=mysql57 INFO[2018-09-13T19:15:11+08:00] Inject fake transaction on new master 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-13T19:15:12+08:00] Switching other slaves to the new master cluster=mysql57 INFO[2018-09-13T19:15:12+08:00] Change master on slave 172.17.5.101:3307 cluster=mysql57 INFO[2018-09-13T19:15:12+08:00] Master switch on 172.17.5.201:3307 complete cluster=mysql57 INFO[2018-09-13T19:15:12+08:00] Master is unreachable but slaves are replicating cluster=mysql57 code=ERR00016 status=RESOLV type=state WARN[2018-09-13T19:15:12+08:00] Failover number of master pings failure has been reached cluster=mysql57 code=WARN0023 status=OPENED type=state WARN[2018-09-13T19:15:12+08:00] Skip slave in election 172.17.5.101:3307 have no master log file, slave might have failed cluster=mysql57 code=ERR00033 status=OPENED type=state INFO[2018-09-13T19:15:14+08:00] Failover number of master pings failure has been reached cluster=mysql57 code=WARN0023 status=RESOLV type=state INFO[2018-09-13T19:15:14+08:00] Skip slave in election 172.17.5.101:3307 have no master log file, slave might have failed cluster=mysql57 code=ERR00033 status=RESOLV type=state WARN[2018-09-13T19:15:14+08:00] No GTID strict mode on master 172.17.5.201:3307 cluster=mysql57 code=WARN0070 status=OPENED type=state
restart replication-manager can fix this problem
So you been already testing last commit :)
Yes that's because the slave is in io thread error , i can fix this !
Yes,thanks! :)
Humm at the same if you have many slaves but one is having network connection issues , do you relly wan't to send traffic to it ?
I can do a test that the master is dead
Thanks! I have two slaves,when the master(172.17.11.242) is dead,the left two slaves(172.17.5.101,172.17.5.201) compose a new replication topology,for example:master(172.17.5.201)-->slave(172.17.5.101)
In my mind,it should be like below: [root]# date&&nslookup read_mysql57.service.consul Thu Sep 13 19:29:01 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
Name: read_mysql57.service.consul Address: 172.17.5.101
But the truth is that it can't resolve read_mysql57.service.consul,so it confused me!
Do you test commit bbf03e1 ? And still get issues ?
Well,not yet, i'll check it later!
Oh that is after master failover i'll test this thanks
Ok i have push some changes , Consul is special case vs other proxies refresh state at every monitoring loop while with DNS only when something happen on the cluster witch bring more work :) Let me know about those push Thanks
Hi,the problem still exists!
[root]# date&&nslookup read_mysql57.service.consul Fri Sep 14 10:31:47 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53
** server can't find read_mysql57.service.consul: NXDOMAIN
mrm log: INFO[2018-09-14T10:30:13+08:00] Master Failure detected! Retry 3/5 cluster=mysql57 INFO[2018-09-14T10:30:15+08:00] Master Failure detected! Retry 4/5 cluster=mysql57 INFO[2018-09-14T10:30:17+08:00] Master Failure detected! Retry 5/5 cluster=mysql57 INFO[2018-09-14T10:30:17+08:00] Declaring master as failed cluster=mysql57 WARN[2018-09-14T10:30:17+08:00] Server 172.17.11.242:3307 state changed from Suspect to Failed cluster=mysql57 type=alert INFO[2018-09-14T10:30:17+08:00] Register consul master ID write_mysql57 with host 172.17.11.242:3307 cluster=mysql57 INFO[2018-09-14T10:30:17+08:00] Register consul read service 13584063653535782636 172.17.5.101:3307 cluster=mysql57 INFO[2018-09-14T10:30:17+08:00] Register consul read service 4728595097024489897 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] ------------------------ cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Starting master failover cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] ------------------------ cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Electing a new master cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Election matrice: [ { "URL": "172.17.5.101:3307", "Indice": 0, "Pos": 0, "Seq": 0, "Prefered": false, "Ignoredconf": false, "Ignoredrelay": false, "Ignoredmultimaster": false, "Ignoredreplication": true, "Weight": 0 }, { "URL": "172.17.5.201:3307", "Indice": 1, "Pos": 3274, "Seq": 0, "Prefered": false, "Ignoredconf": false, "Ignoredrelay": false, "Ignoredmultimaster": false, "Ignoredreplication": false, "Weight": 0 } ] cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Slave 172.17.5.201:3307 has been elected as a new master cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Waiting for candidate master to apply relay log cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Reading all relay logs on 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Stopping slave thread on new master cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Failover Proxy Type: proxysql Host: 172.17.5.12 Port: 6032 cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Register consul master ID write_mysql57 with host 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Ignore consul read service 13584063653535782636 172.17.5.101:3307%!(EXTRA bool=false) cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Ignore consul read service 4728595097024489897 172.17.5.201:3307%!(EXTRA bool=true) cluster=mysql57 INFO[2018-09-14T10:30:23+08:00] Resetting slave on new master and set read/write mode on cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Inject fake transaction on new master 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Switching other slaves to the new master cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Change master on slave 172.17.5.101:3307 cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Register consul master ID write_mysql57 with host 172.17.5.201:3307 cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Ignore consul read service 13584063653535782636 172.17.5.101:3307%!(EXTRA bool=false) cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Ignore consul read service 4728595097024489897 172.17.5.201:3307%!(EXTRA bool=true) cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Master switch on 172.17.5.201:3307 complete cluster=mysql57 INFO[2018-09-14T10:30:24+08:00] Master is unreachable but slaves are replicating cluster=mysql57 code=ERR00016 status=RESOLV type=state WARN[2018-09-14T10:30:24+08:00] Failover number of master pings failure has been reached cluster=mysql57 code=WARN0023 status=OPENED type=state WARN[2018-09-14T10:30:24+08:00] Skip slave in election 172.17.5.101:3307 have no master log file, slave might have failed cluster=mysql57 code=ERR00033 status=OPENED type=state INFO[2018-09-14T10:30:26+08:00] Failover number of master pings failure has been reached cluster=mysql57 code=WARN0023 status=RESOLV type=state INFO[2018-09-14T10:30:26+08:00] Skip slave in election 172.17.5.101:3307 have no master log file, slave might have failed cluster=mysql57 code=ERR00033 status=RESOLV type=state WARN[2018-09-14T10:30:26+08:00] No GTID strict mode on master 172.17.5.201:3307 cluster=mysql57 code=WARN0070 status=OPENED type=state
@svaroqui I got the same situation when testing replication-manager-osc-2.0.1_26 with consul. I think, when master down, the read domain name should resolve to the slave of the new replication topology. But now, the write domain name can be resolved normally, the read domain name cannot. Is there any plan to fix it?