replication-manager icon indicating copy to clipboard operation
replication-manager copied to clipboard

[2.3.20] No file found /var/lib/replication-manager/clusterX/x.x.x.x/serverstate.json

Open firdibasri opened this issue 1 year ago • 22 comments

I just install fresh a repman 2.3.20 (latest) with existing replication mariadb (1 master, 3 slaves), setup config (cluster1.toml), and then restart the service. Then I Login to http and see those all nodes are in status "suspect" (should Master or Slave). From the log file, I got this msg:

No file found /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json: open /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json: no such file or directory

replication-manager.log

and: image

But with previous version (2.3.18) its normal and recognizable the master & slaves nodes and the file "serverstate.json" exists in the directory

firdibasri avatar Apr 07 '24 04:04 firdibasri

Is the replication running currently?

ahfa92 avatar Apr 08 '24 01:04 ahfa92

hi @ahfa92 , yes of course, 4 nodes/instance (1 master, 3 slaves) in a single server (virtual box), have the same GTID also, running properly. And here all configuration file I've prepared:

config.toml (nothing changed) config.toml.txt

cluster2.toml cluster2.toml.txt

master & slave (.cnf) master-slaves.cnf.txt

OS: Ubuntu 20.04 (focal) MariaDB: 10.11

firdibasri avatar Apr 08 '24 02:04 firdibasri

can you give me the result of ls -lah /var/lib/replication-manager/cluster2/192.168.18.55_3307

caffeinated92 avatar Apr 08 '24 03:04 caffeinated92

hi @caffeinated92 , here is: image

home: image

and inside: image

firdibasri avatar Apr 08 '24 03:04 firdibasri

and also I add result from my existing replication: image

the replication working properly

firdibasri avatar Apr 08 '24 04:04 firdibasri

Can you try to restart the replication manager?

caffeinated92 avatar Apr 08 '24 04:04 caffeinated92

@caffeinated92 , yes sure, but still get the same result. image

Here is the log: (2) replication-manager.log

firdibasri avatar Apr 08 '24 04:04 firdibasri

can you show me: show slave hosts from each instance? time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3307 has no slaves " cluster=cluster2 time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3308 has no slaves " cluster=cluster2 time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3309 has no slaves " cluster=cluster2 time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3310 has no slaves " cluster=cluster2

caffeinated92 avatar Apr 08 '24 04:04 caffeinated92

Sure, here is: image

firdibasri avatar Apr 08 '24 04:04 firdibasri

Just for comparison, here I add some capture from version: 2.3.18 (different VM):

home: image

inside: image

exists for file: serverstate.json image

firdibasri avatar Apr 08 '24 04:04 firdibasri

can you try to touch /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json

caffeinated92 avatar Apr 08 '24 05:04 caffeinated92

Wait, if it's different VM, do you have the required privileges? db-servers-credential = "firdi:firdi123" are you using the correct username? Please use mariadb -h 192.168.18.55 -u firdi -pfirdi123 -P3307 to check

caffeinated92 avatar Apr 08 '24 05:04 caffeinated92

can you try to touch /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json

Sure, success: image

Wait, if it's different VM, do you have the required privileges? db-servers-credential = "firdi:firdi123" are you using the correct username? Please use mariadb -h 192.168.18.55 -u firdi -pfirdi123 -P3307 to check

here is the result: image

firdibasri avatar Apr 08 '24 05:04 firdibasri

can you try to set the log to verbose let me know what it logs?

caffeinated92 avatar Apr 08 '24 05:04 caffeinated92

Sure from file cluster2.toml, I already set verbose = true

/var/log/replication-manager.log: (3) replication-manager.log

firdibasri avatar Apr 08 '24 05:04 firdibasri

Okay. I'm on my way checking. Hope it will be solved soon

caffeinated92 avatar Apr 08 '24 07:04 caffeinated92

hi @caffeinated92 , I've been update to 2.3.21 and the result now:

current version: image

file serverstate.json now is exists: image

home: image But for the Topology is still "unknown". For previous version (2.3.18) is able to autodetect and get result 'master-slave' (my capture image above)

inside: image

replication-manager.log: (4) replication-manager.log

And thank you for solving the problems, really good :)

firdibasri avatar Apr 08 '24 15:04 firdibasri

I will check on this again and make sure if everything works well

caffeinated92 avatar Apr 08 '24 16:04 caffeinated92

I don't get what is special about your setup i have played all day with 2.3.20 without behing able to make it deadlock can you send a show all slaves status from all nodes. Can you also please mv all datadir to something like backup and rm -rf /var/lib/replication-manager/* and see if it does recover the topology

svaroqui avatar Apr 08 '24 17:04 svaroqui

Can you test the last push release without repo to see if it fixed your issue ?

svaroqui avatar Apr 08 '24 20:04 svaroqui

Can you test the last push release without repo to see if it fixed your issue ?

hi @svaroqui , yes sure: image

and the result is: Topology = unknown and the light (Is Down) is still yellow: image

inside: image

firdibasri avatar Apr 08 '24 21:04 firdibasri

It seems that in 2.3.20 there were some changes related to topology discovery, https://github.com/signal18/replication-manager/compare/v2.3.19...v2.3.20#diff-86cbfc045d9b3441a4f4c2235e276915275698f1e70a42e38e74a846cb9bdcc4L652 notably in cluster/cluster.go line 652, getTopology was removed and moved earlier in the process (I dont know for which reason) maybe that is causing the issue since getTopology(fromConf) will only find a MasterServer conf if cluster.master is not nil

	} else {
		relay := cluster.GetRelayServer()
		if relay != nil && cluster.Conf.ReplicationNoRelay == false {
			cluster.Conf.Topology = topoMultiTierSlave
		} else if cluster.master != nil {
			cluster.Conf.Topology = topoMasterSlave
		}
	}

I would assume the condition is not working at this point.

@svaroqui maybe you can explain why you moved the topology on top of the cluster.Run function?

tanji avatar Apr 08 '24 22:04 tanji