replication-manager
replication-manager copied to clipboard
[2.3.20] No file found /var/lib/replication-manager/clusterX/x.x.x.x/serverstate.json
I just install fresh a repman 2.3.20 (latest) with existing replication mariadb (1 master, 3 slaves), setup config (cluster1.toml), and then restart the service. Then I Login to http and see those all nodes are in status "suspect" (should Master or Slave). From the log file, I got this msg:
No file found /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json: open /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json: no such file or directory
and:
But with previous version (2.3.18) its normal and recognizable the master & slaves nodes and the file "serverstate.json" exists in the directory
Is the replication running currently?
hi @ahfa92 , yes of course, 4 nodes/instance (1 master, 3 slaves) in a single server (virtual box), have the same GTID also, running properly. And here all configuration file I've prepared:
config.toml (nothing changed) config.toml.txt
cluster2.toml cluster2.toml.txt
master & slave (.cnf) master-slaves.cnf.txt
OS: Ubuntu 20.04 (focal) MariaDB: 10.11
can you give me the result of ls -lah /var/lib/replication-manager/cluster2/192.168.18.55_3307
hi @caffeinated92 , here is:
home:
and inside:
and also I add result from my existing replication:
the replication working properly
Can you try to restart the replication manager?
@caffeinated92 , yes sure, but still get the same result.
Here is the log: (2) replication-manager.log
can you show me:
show slave hosts
from each instance?
time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3307 has no slaves " cluster=cluster2
time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3308 has no slaves " cluster=cluster2
time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3309 has no slaves " cluster=cluster2
time="2024-04-08 11:29:06" level=debug msg="Server 192.168.18.55:3310 has no slaves " cluster=cluster2
Sure, here is:
Just for comparison, here I add some capture from version: 2.3.18 (different VM):
home:
inside:
exists for file: serverstate.json
can you try to touch /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json
Wait, if it's different VM, do you have the required privileges? db-servers-credential = "firdi:firdi123" are you using the correct username?
Please use mariadb -h 192.168.18.55 -u firdi -pfirdi123 -P3307 to check
can you try to
touch /var/lib/replication-manager/cluster2/192.168.18.55_3307/serverstate.json
Sure, success:
Wait, if it's different VM, do you have the required privileges? db-servers-credential = "firdi:firdi123" are you using the correct username? Please use
mariadb -h 192.168.18.55 -u firdi -pfirdi123 -P3307to check
here is the result:
can you try to set the log to verbose let me know what it logs?
Sure from file cluster2.toml, I already set verbose = true
/var/log/replication-manager.log: (3) replication-manager.log
Okay. I'm on my way checking. Hope it will be solved soon
hi @caffeinated92 , I've been update to 2.3.21 and the result now:
current version:
file serverstate.json now is exists:
home:
But for the Topology is still "unknown". For previous version (2.3.18) is able to autodetect and get result 'master-slave' (my capture image above)
inside:
replication-manager.log: (4) replication-manager.log
And thank you for solving the problems, really good :)
I will check on this again and make sure if everything works well
I don't get what is special about your setup i have played all day with 2.3.20 without behing able to make it deadlock can you send a show all slaves status from all nodes. Can you also please mv all datadir to something like backup and rm -rf /var/lib/replication-manager/* and see if it does recover the topology
Can you test the last push release without repo to see if it fixed your issue ?
Can you test the last push release without repo to see if it fixed your issue ?
hi @svaroqui , yes sure:
and the result is: Topology = unknown and the light (Is Down) is still yellow:
inside:
It seems that in 2.3.20 there were some changes related to topology discovery,
https://github.com/signal18/replication-manager/compare/v2.3.19...v2.3.20#diff-86cbfc045d9b3441a4f4c2235e276915275698f1e70a42e38e74a846cb9bdcc4L652
notably in cluster/cluster.go line 652, getTopology was removed and moved earlier in the process (I dont know for which reason) maybe that is causing the issue
since getTopology(fromConf) will only find a MasterServer conf if cluster.master is not nil
} else {
relay := cluster.GetRelayServer()
if relay != nil && cluster.Conf.ReplicationNoRelay == false {
cluster.Conf.Topology = topoMultiTierSlave
} else if cluster.master != nil {
cluster.Conf.Topology = topoMasterSlave
}
}
I would assume the condition is not working at this point.
@svaroqui maybe you can explain why you moved the topology on top of the cluster.Run function?