replication-manager
replication-manager copied to clipboard
Cluster with non-default replication channel: rejoining fails [v2.1]
Following up on #216 and still using https://github.com/erasys/mariadb-ha-test as test dummy:
If I kill the master of a cluster and the failover works properly, I can't rejoin the former master when I restart it, apparently because the "STOP SLAVE" isn't issued for the proper replication channel:
level=error msg="Failed in GTID rejoin old Master in sync Change master statement CHANGE MASTER 'cluster_a' TO master_host='mariadb_a2', master_port=3306, master_user='replicator', master_password='replicator', master_connect_retry=5, master_heartbeat_period=3, MASTER_USE_GTID=CURRENT_POS failed, reason: Error 1198: This operation cannot be performed as you have a running slave 'cluster_a'; run STOP SLAVE 'cluster_a' first" cluster=cluster_a
I can provide steps to reproduce if needed. My test dummy above includes a multi-source replication, but the rejoin doesn't work on a single-source replication as long as there's a replication-source-name
set.
If that is an old master that rejoined why would it get a replication at all?
Le 16 mars 2018 13:55, "Jan" [email protected] a écrit :
Following up on #216 https://github.com/signal18/replication-manager/issues/216 and still using https://github.com/erasys/mariadb-ha-test as test dummy:
If I kill the master of a cluster and the failover works properly, I can't rejoin the former master when I restart it, apparently because the "STOP SLAVE" isn't issued for the proper replication channel:
level=error msg="Failed in GTID rejoin old Master in sync Change master statement CHANGE MASTER 'cluster_a' TO master_host='mariadb_a2', master_port=3306, master_user='replicator', master_password='replicator', master_connect_retry=5, master_heartbeat_period=3, MASTER_USE_GTID=CURRENT_POS failed, reason: Error 1198: This operation cannot be performed as you have a running slave 'cluster_a'; run STOP SLAVE 'cluster_a' first" cluster=cluster_a
I can provide steps to reproduce if needed. My test dummy above includes a multi-source replication, but the rejoin doesn't work on a single-source replication as long as there's a replication-source-name set.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/219, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIHB9HGI4Ty6kSNdaLsF00ayHFSDqks5te7Y6gaJpZM4StvcD .
Valid point. In my test dummy, the master is also configured as a slave, but of course there's no master_host
set and no START SLAVE
executed:
https://github.com/erasys/mariadb-ha-test/blob/e76e22b13775e92fa4961c024ea01cff42258581/setup/setup_master_slave_replication.sh#L20
That's was a relic from a multi-master tryout. When I remove those lines from my setup script, the issue is gone.
However, on the other hand, woulnd't it be best practice to always assure a STOP SLAVE
before doing a CHANGE MASTER TO
?
Le 16 mars 2018 à 15:06, Jan [email protected] a écrit :
Valid point. In my test dummy, the master is also configured as a slave, but of course there's no master_host set and no START SLAVE executed:
https://github.com/erasys/mariadb-ha-test/blob/master/setup/setup_master_slave_replication.sh#L20 https://github.com/erasys/mariadb-ha-test/blob/master/setup/setup_master_slave_replication.sh#L20 That's was a relic from a multi-master tryout. When I remove those lines from my setup script, the issue is gone.
However, on the other hand, woulnd't it be best practice to always assure a STOP SLAVE before doing a CHANGE MASTER TO?
Sure we may consider it but it can hide some issues in internals , it may be more accurate to report an invalid state for a joining node if it get a replication already up !
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/219#issuecomment-373723568, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIBv1wQtFh1brc8o3dKl3NLQZ4Bk8ks5te8bigaJpZM4StvcD.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/