replication-manager icon indicating copy to clipboard operation
replication-manager copied to clipboard

switchover with children cluster will set the readonly flag on children cluster primary server

Open rdemongeot opened this issue 10 months ago • 10 comments

We have 2 different clusters. First cluster 'French' (FR1 & FR2) is a primary/replica cluster. Second cluster 'Europe' (EU1 & EU2) is also a primary/replica cluster; but the 'Europe' primary is also a replica from 'French' only for 'French' subset of data.

Initial_state

When we want to switchover French cluster from primary to secondary; (From FR1 to FR2 for exemple) - in order to perform a work into FR1 - the switch over will put the flag read_only into EU1.

replication manager logs

time="2024-04-24 07:34:46" level=info msg="Starting master switchover" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:46" level=info msg="Freezing writes set read only on FR-01" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:48" level=info msg="Switching other slaves to the new master" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:48" level=info msg="Waiting for slave EU-01 to sync" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:48" level=info msg="Change master on slave EU-01" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:50" level=info msg="Server EU-01 disable read only as last non slave" cluster=cluster_EU_masterslave

I can't see any "putting read_only flag"

proxysql logs

2024-04-24 07:34:48 [INFO] Server 'EU-01' found with 'read_only=1', but not found as reader
2024-04-24 07:34:50 [INFO] Server 'EU-01' found with 'read_only=0', but not found as writer

final_state

The flag will be removed few seconds later; but during this time slot; ProxySQL see this read_only flag; and move the EU1 Server as non-writer generating issues on many application who try to write on it.

This issue (and putting RO/removing it quickly) will generate a race condition on old proxysql (2.0) - breaking the state machine - and forcing to restart the proxysql engine.

rdemongeot avatar Apr 24 '24 06:04 rdemongeot

Perhaps on :

utils/dbhelper/dbhelper.go :

                if cluster.Conf.ReadOnly && cluster.Conf.MxsBinlogOn == false && !cluster.IsInIgnoredReadonly(sl) {
                        logs, err = sl.SetReadOnly()
                        cluster.LogSQL(logs, err, sl.URL, "MasterFailover", LvlErr, "Could not set slave %s as read-only, %s", sl.URL, err)
                }

rdemongeot avatar Apr 24 '24 06:04 rdemongeot

Can you explain your topology?

time="2024-04-24 07:34:50" level=info msg="Server EU-01 disable read only as last non slave" cluster=cluster_EU_masterslave

ahfa92 avatar Apr 24 '24 07:04 ahfa92

If you can, please explain your topology and your expected behavior. Please send us your configurations and logs for further examination. It will help us in assessing the situation. Thank you

ahfa92 avatar Apr 24 '24 07:04 ahfa92

If you can, please explain your topology and your expected behavior. Please send us your configurations and logs for further examination. It will help us in assessing the situation. Thank you

FR Cluster is the "First cluster" in the picture; EU cluster is the second on on the picture.

During a switch over on FR cluster (which is one of the replication source for EU) Replication manager will set the read_only flag on EU Primary cluster. But EU Primary NEED to be read-write all the time.

After the switchover; Replication manager (for EU cluster) see that EU-01 is not RW; and put it again as RW. But we have EU-01 forced (by mistake) as read_only for 2 seconds.

rdemongeot avatar Apr 24 '24 07:04 rdemongeot

can you show the GUI of the topology? what topology it's detected as? if it was detected as master-slave, then it will try to force EU to read only.

ahfa92 avatar Apr 24 '24 07:04 ahfa92

EU1 is detected as secondary, but ignored one.

It is not into the configuration of FR1; but all the replication flow are named; and EU1 is a replica of FR cluster (flow named FR) so detected by Replication Manager as secondary ignored.

Reconfigure EU1 to be replica of FR2 when switchover is the normal way (child cluster SHOULD be a replica of Primary FR).

Initial_state

FR Cluster is the cluster on the top of this picture; EU Cluster is the cluster on the bottom.

rdemongeot avatar Apr 24 '24 07:04 rdemongeot

Please show the screenshot of the dashboard of replication manager.

ahfa92 avatar Apr 24 '24 07:04 ahfa92

Please show the screenshot of the dashboard of replication manager.

It's not easy, names are rewritten for obfuscation reasons, a screenshot would not be obfuscated :'(

rdemongeot avatar Apr 24 '24 07:04 rdemongeot

We will check on this.

Thank you for your patience.

ahfa92 avatar Apr 24 '24 08:04 ahfa92

Already done some patch in #569 and #570

ahfa92 avatar May 02 '24 11:05 ahfa92

Please reopen if not fixed

svaroqui avatar May 14 '24 17:05 svaroqui