mysql-operator icon indicating copy to clipboard operation
mysql-operator copied to clipboard

Orchestrator "Unable to determine cluster name" after deploying cluster, which was deleted.

Open DeamonMV opened this issue 5 years ago • 3 comments

How problems looks like

right after problem was discovered:

orchestrator showed this error

ERROR Unable to determine cluster name. clusterHint=%clusterName%

After few days orchestrator show only this message:

[martini] Started GET /api/cluster/%clusterName% for 127.0.0.1:53046
[martini] Completed 500 Internal Server Error in 154.335294ms

How we got a problem

Drain with specific options lead to problem with Termination of mysql node pods , which lead to Ready: False status of mysql cluster.

For example if use such options for dain command - kubectl drain --ignore-daemonsets --grace-period 300--timeout 300s--delete-local-data test-k8s-worker-3 drain will failed and mysql pod will be terminated right after drain has failed. Such behavior can lead to problem with mysql cluster, and mysql cluster can get Ready: False status.

What we did to make it work again.

To make it work I tried few things:

  1. I tried scale mysql cluster from 2 nodes to 1. This didn't helped.
  2. I tried scale mysql cluster from 1 nodes to 0. Then scale to 1 node again. This didn't helped.
  3. Then I deleted whole mysql cluster and redployed him. And this also didn't helped.

After all those tries mysql cluster was not accessable for connection.

And most importand after I did redeploy of mysql cluster I got error whic was described in first section.

My workaround

I have deployed cluster with different name and make my service work again. Mysql cluster with a different name was able to get Ready and I was able to restore BD.

Environment

Kubernetes 1.14.6 Orchestrator mysql-operator-orchestrator:latest Operator quay.io/presslabs/mysql-operator:0.3.8 Mysql percona@sha256:713c1817615b333b17d0fbd252b0ccc53c48a665d4cfcb42178167435a957322

Question

How to resolve problem and get working mysql cluster with old name.

FYI

Secrets with passwords didn't deleted while I redploying of the mysql cluster

DeamonMV avatar Aug 03 '20 07:08 DeamonMV

Hi @DeamonMV thank you for reporting this. Can you please estimate what was the time between deleting the cluster and recreating it again? Because Orchestrator deletes nodes when they are inactive for more than 1 hour, see this option.

We will try to reproduce it and find the root cause. Thank you!

AMecea avatar Aug 03 '20 15:08 AMecea

Hello

Can you please estimate what was the time between deleting the cluster and recreating it again?

About one minute.

I've already did couple of times such procedure, and all was fine.

DeamonMV avatar Aug 04 '20 07:08 DeamonMV

Hi @AMecea

I faced a similar problem:

Sporadically see ERROR “Unable to determine cluster name. clusterHint=XXXX“, However, this error goes away automatically after a couple of minutes usually, but until it's gone, switchovers fail for the chain because we use orchestrator as the source of truth for the chain's primary.

Could you please help me to find the root cause and fix it?

Version: 3.2.6-11

aaditya-dubey14 avatar Sep 25 '24 10:09 aaditya-dubey14