orchestrator icon indicating copy to clipboard operation
orchestrator copied to clipboard

bug report,Cluster alias update failed after master failover

Open jxs-2022 opened this issue 3 years ago • 2 comments

If this is a bug report, please provide a test case and the error output. Useful information:

  • your orchestrator.conf.json config file/contents 1???Except for the account configuration, other parameters remain unchanged. ???raft: false???
"HostnameResolveMethod": "none",
"MySQLHostnameResolveMethod": "@@report_host",
"RecoverMasterClusterFilters": [
   "10.*"
 ]
  • your topology (e.g. run orchestrator-client -c topology -alias my-cluster)
10.0.9.132:61106   [0s,ok,8.0.20-11,rw,ROW,>>,GTID]
+ 10.0.9.131:61106 [0s,ok,8.0.20-11,rw,ROW,>>,GTID]
+ 10.0.9.133:61106 [0s,ok,8.0.20-11,rw,ROW,>>,GTID]
  • what did you do? 1???First, i set cluster alias???cluster_name: 10.0.9.132:61106, alias: eeo) by web api.
    2???After auto master failover, Cluster architecture changed to
frenky@frenkydeMacBook-Pro bin % ./orchestrator-client -c topology -alias 10.0.9.131:61106
10.0.9.131:61106   [0s,ok,8.0.20-11,rw,ROW,>>,GTID]
+ 10.0.9.133:61106 [0s,ok,8.0.20-11,rw,ROW,>>,GTID]

>  new cluster:???cluster_name: 10.0.9.131:61106, alias: 10.0.9.131:61106??????alias is cluster_name.

frenky@frenkydeMacBook-Pro bin % ./orchestrator-client -c topology -alias eeo
10.0.9.132:61106 [unknown,invalid,8.0.20-11,rw,ROW,>>,GTID,downtimed]

>  failed cluster:???cluster_name: 10.0.9.132:61106, alias: eeo??????alias is eeo.
  • what did you expect to happen? 1???The cluster structure I would like is shown below
>  new cluster:???cluster_name: 10.0.9.131:61106, alias: eeo??????alias is eeo.
> 
>  failed cluster:???cluster_name: 10.0.9.132:61106, alias: 10.0.9.132:61106??????alias is 10.0.9.132:61106.
  • what happened? 1???Since I have aliased the cluster, the code follows the following logic???
if alias := analysisEntry.ClusterDetails.ClusterAlias; alias != "" {
       inst.SetClusterAlias(promotedReplica.Key.StringCode(), alias)
}

2???When the above function is executed, the new cluster name should be eeo???

new cluster:???cluster_name: 10.0.9.131:61106, alias: eeo???

3???During master failover, because the 'go inst.UpdateClusterAliases()' executes this SQL every 5 seconds

replace into cluster_alias (alias, cluster_name, last_registered)
select
	 cluster_name as alias, cluster_name, now()
from  database_instance
group by  cluster_name
having
sum(suggested_cluster_alias = '') = count(*)
`)

value: ('10.0.9.131:61106', '10.0.9.131:61106', now())

4???Cluster alias was replaced because of unique constraint on the table. The end result was not what we expected.

???cluster_name: 10.0.9.131:61106, alias: eeo???   was replaced to ???cluster_name: 10.0.9.131:61106, alias: 10.0.9.131:61106)

CREATE TABLE `cluster_alias` (
  `cluster_name` varchar(128) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `alias` varchar(128) NOT NULL,
  `last_registered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`cluster_name`),
  UNIQUE KEY `alias_uidx` (`alias`),
  KEY `last_registered_idx` (`last_registered`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
  • How to fix ???
  1. topology_recover.go
if alias := analysisEntry.ClusterDetails.ClusterAlias; alias != "" {
       + inst.SetClusterNameByAliasOverride(after, alias)
       inst.SetClusterAlias(promotedReplica.Key.StringCode(), alias)
}
  1. cluster_alias.go
func SetClusterNameByAliasOverride(newClusterName string, alias string) error {
	return updateClusterNameByAliasOverride(newClusterName, alias)
}
  1. cluster_alias_dao.go
func updateClusterNameByAliasOverride(newClusterName string, alias string) error {
	writeFunc := func() error {
		_, err := db.ExecOrchestrator(`
			update cluster_alias_override set cluster_name = ? where alias=?
			`,
			newClusterName, alias)
		return log.Errore(err)
	}
	return ExecDBWriteFunc(writeFunc)
}

Unique constraints are recommended for alias field of cluster_alias_override

jxs-2022 avatar Jun 11 '22 14:06 jxs-2022

I'm facing exactly the same issue. If the cluster alias would change, it's inconvenient to locate a certain cluster as there is not an identifier

TeemoKill avatar Jul 13 '22 08:07 TeemoKill

I'm facing exactly the same issue. If the cluster alias would change, it's inconvenient to locate a certain cluster as there is not an identifier

The above repair measures have been applied to our production environment and operate normally. In abnormal or normal master switching scenarios, you can test it to see whether it meets your requirements

jxs-2022 avatar Jul 20 '22 03:07 jxs-2022