mysql-operator icon indicating copy to clipboard operation
mysql-operator copied to clipboard

On multi-master mode. When `0` instance down. It can't join cluster.

Open alantang888 opened this issue 6 years ago • 2 comments

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

MySQL Operator Version: 0.3.0 Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.8", GitCommit:"7eab6a49736cc7b01869a15f9f05dc5b49efb9fc", GitTreeState:"clean", BuildDate:"2018-09-14T15:54:20Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
  • Kernel (e.g. uname -a): Linux ip-172-20-120-117 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13) x86_64 GNU/Linux
  • Others:

What happened?

On multi-master mode. When 0 instance down but the cluster have other instance is running. It can't join cluster.

What you expected to happen?

If the cluster have other instance running. It should can rejoin cluster.

How to reproduce it (as minimally and precisely as possible)?

Create a 3 nodes multi-master MySQL cluster with PVC. After whole cluster ready. Delete pod of instance 0.

Anything else we need to know?

Log message from mysql-agent:

Starting mysql-agent version 0.3.0
I1130 02:29:53.552218       1 main.go:48] FLAG: --address="0.0.0.0"
I1130 02:29:53.552261       1 main.go:48] FLAG: --alsologtostderr="false"
I1130 02:29:53.552266       1 main.go:48] FLAG: --cluster-name="mysql-dummp"
I1130 02:29:53.552276       1 main.go:48] FLAG: --healthcheck-port="10512"
I1130 02:29:53.552288       1 main.go:48] FLAG: --hostname="mysql-dummp-0"
I1130 02:29:53.552292       1 main.go:48] FLAG: --log-backtrace-at=":0"
I1130 02:29:53.552304       1 main.go:48] FLAG: --log-dir=""
I1130 02:29:53.552314       1 main.go:48] FLAG: --log-flush-frequency="5s"
I1130 02:29:53.552347       1 main.go:48] FLAG: --logtostderr="true"
I1130 02:29:53.552357       1 main.go:48] FLAG: --min-resync-period="12h0m0s"
I1130 02:29:53.552362       1 main.go:48] FLAG: --namespace="test"
I1130 02:29:53.552365       1 main.go:48] FLAG: --stderrthreshold="2"
I1130 02:29:53.552368       1 main.go:48] FLAG: --v="4"
I1130 02:29:53.552375       1 main.go:48] FLAG: --vmodule=""
I1130 02:29:53.559273       1 cluster_manager.go:116] Database not yet running. Waiting...
I1130 02:30:03.895313       1 cluster_manager.go:256] Checking if instance can rejoin cluster
I1130 02:30:03.895344       1 cluster_manager.go:263] Attempting to rejoin instance to cluster
E1130 02:30:04.284106       1 cluster_manager.go:268] Failed to rejoin cluster: SystemError: RuntimeError: Cluster.rejoin_instance: ERROR: Error joining instance to cluster: Cannot join instance 'mysql-dummp-0.mysql-dummp:3306'. Peer instance 'mysql-dummp-0.mysql-dummp:3306' state is currently 'None', but is expected to be 'ONLINE'.
E1130 02:30:14.538086       1 cluster_manager.go:253] Failed to determine if we can rejoin the cluster: SystemError: RuntimeError: Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, but GR is not active)
E1130 02:30:24.786935       1 cluster_manager.go:253] Failed to determine if we can rejoin the cluster: SystemError: RuntimeError: Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, but GR is not active)

alantang888 avatar Nov 30 '18 03:11 alantang888

I found it not occur 100%. But retry more time. Will have same error again.

alantang888 avatar Dec 10 '18 01:12 alantang888

One more finding. If this happen. Will have a log on mysql container [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Invalid hostname or IP address (mysql-dummy-0.mysql-dummy:33061) assigned to the parameter local_node!'

alantang888 avatar Dec 11 '18 06:12 alantang888