replication-manager consul dns resolve the old dead master

Hi,with mysql 5.6,after i kill the old master(172.17.5.101),the consul dns can still resolve the old dead master

[root]# date&&nslookup write_mysql56.service.consul Fri Sep 14 14:43:34 CST 2018 Server: 172.17.5.201 Address: 172.17.5.201#53

Name: write_mysql56.service.consul Address: 172.17.10.62 Name: write_mysql56.service.consul Address: 172.17.5.101

Sep 14 '18 06:09 tangweichun

Re,

When you are in such a state can send me the result of curl http://localhost:8500/v1/catalog/service/write_mysql56

Tx /svar

Sep 14 '18 08:09 svaroqui

[root@ah-zabbix-ws mha4mysql-manager]# curl http://localhost:8500/v1/catalog/service/write_mysql56 [{"ID":"4dd875af-7b88-bcb9-81c5-d3ef54c81e94","Node":"ah-monitor","Address":"172.17.5.12","Datacenter":"consul-cluster","TaggedAddresses":{"lan":"172.17.5.12","wan":"172.17.5.12"},"NodeMeta":{"consul-network-segment":""},"ServiceKind":"","ServiceID":"write_mysql56","ServiceName":"write_mysql56","ServiceTags":["v-789c32d033d03300040000ffff02c900ed"],"ServiceAddress":"172.17.10.62","ServiceMeta":{},"ServicePort":5306,"ServiceEnableTagOverride":false,"ServiceProxyDestination":"","ServiceConnect":{"Native":false,"Proxy":null},"CreateIndex":39555,"ModifyIndex":39726},{"ID":"39924656-802f-d32b-95fd-df47d8a64fa9","Node":"ah-zabbix-ws","Address":"172.17.5.101","Datacenter":"consul-cluster","TaggedAddresses":{"lan":"172.17.5.101","wan":"172.17.5.101"},"NodeMeta":{"consul-network-segment":""},"ServiceKind":"","ServiceID":"write_mysql56","ServiceName":"write_mysql56","ServiceTags":["v-789c32d033d03300040000ffff02c900ed"],"ServiceAddress":"172.17.10.62","ServiceMeta":{},"ServicePort":5306,"ServiceEnableTagOverride":false,"ServiceProxyDestination":"","ServiceConnect":{"Native":false,"Proxy":null},"CreateIndex":39364,"ModifyIndex":39725}]

Sep 14 '18 08:09 tangweichun

This looks like wrong consul configuration as the information is correct in your local consul but not on the other node of the cluster. RAFT Require a minimum of 3 nodes , 5 is advice.

Sep 14 '18 08:09 svaroqui

I'll test it later,thanks!

Sep 14 '18 09:09 tangweichun

I think it's my fault...I'll tell you this case later .I found that while "slave io_thread=Connecting",it still can resolve the slave node though consul

Sep 14 '18 11:09 tangweichun

Yes because when lost master , we still wan't to read for slaves, is that possible that the master is running and you are in connecting state because of network issues ?

Sep 14 '18 11:09 svaroqui

No...I found that the slave was connecting to the old dead master(172.17.5.101)!

mysql error log: 2018-09-14 19:16:09 14744 [ERROR] Slave I/O: error connecting to master '[email protected]:5306' - retry-time: 5 retries: 75, Error_code: 2003 2018-09-14 19:16:14 14744 [ERROR] Slave I/O: error connecting to master '[email protected]:5306' - retry-time: 5 retries: 76, Error_code: 2003 2018-09-14 19:16:19 14744 [ERROR] Slave I/O: error connecting to master '[email protected]:5306' - retry-time: 5 retries: 77, Error_code: 2003 2018-09-14 19:16:24 14744 [ERROR] Slave I/O: error connecting to master '[email protected]:5306' - retry-time: 5 retries: 78, Error_code: 2003 2018-09-14 19:16:29 14744 [Note] Slave I/O thread: connected to master '[email protected]:5306',replication started in log 'mysql-bin.000007' at position 270 2018-09-14 19:16:31 14744 [Note] Error reading relay log event: slave SQL thread was killed 2018-09-14 19:16:31 14744 [Note] Slave I/O thread killed while reading event 2018-09-14 19:16:31 14744 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000008', position 195 2018-09-14 19:16:31 14744 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='172.17.5.101', master_port= 5306, master_log_file='mysql-bin.000008', master_log_pos= 195, master_bind=''. New state master_host='172.17.10.62', master_port= 5306, master_log_file='mysql-bin.000009', master_log_pos= 270, master_bind=''. 2018-09-14 19:16:31 14744 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

After i start the old dead master(172.17.5.101),it becomes ok,the master is the right new master(172.17.10.62). My test environment is complex,maybe i made a mistake...

Sep 14 '18 11:09 tangweichun

I think it's my fault...I'll tell you this case later .I found that while "slave io_thread=Connecting",it still can resolve the slave node though consul

I have too many questions to ask you.

The previous question is:

I have two repication-manger-osc, one to mange my mysql topology,the other one is on standby.

this two nodes are all started: node1(primary): repication-manger-osc+consul client,and "failover-mode = "automatic"

node2(standby): repication-manger-osc+consul client,and "failover-mode = "manual"

test scinario: step 1:stop node1's consul client step 2:kill the mysql master

test destination: check if the standby replication-manager can trigger the consul cluster to unregister and resgiter the right nodes.

ho ho ,obviously i failed

Sep 14 '18 12:09 tangweichun

Le 14 sept. 2018 à 14:04, tangweichun [email protected] a écrit :

I think it's my fault...I'll tell you this case later .I found that while "slave io_thread=Connecting",it still can resolve the slave node though consul

I have too many questions to ask you.

The previous question is:

I have two repication-manger-osc, one to mange my mysql topology,the other one is on standby.

this two nodes are all started: node1(primary): repication-manger-osc+consul client,and "failover-mode = "automatic"

node2(standby): repication-manger-osc+consul client,and "failover-mode = "manual"

test scinario: step 1:stop node1's consul client step 2:kill the mysql master

test destination: check if the standby replication-manager can trigger the consul cluster to unregister and resgiter the right nodes.

ho ho ,obviously i failed

I would not do this , may be it worth using active passive cluster with corosync for the replication-manager in your case Or just accept a single replication-manager to be the arbitrator for the cluster that’s an additional component that can rely in other DC

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/252#issuecomment-421337738, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIBPmE0oVxfVpKL2bruFnohQHyL2Mks5ua5tCgaJpZM4WotgW.

Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/

Sep 14 '18 13:09 svaroqui

Thanks for your advice ,i will test it

Sep 14 '18 14:09 tangweichun

replication-manager replication-manager copied to clipboard

consul dns resolve the old dead master

replication-manager
replication-manager copied to clipboard