mysql-operator icon indicating copy to clipboard operation
mysql-operator copied to clipboard

Operator Pods don't get ready after replication

Open SF2311 opened this issue 4 years ago • 8 comments

Current Setup

We are running an operator deployment defined by this template in namespace mysql-operator:

helm template presslabs presslabs/mysql-operator -n mysql-operator \
     --version 0.4.0 \
     --include-crds \
     --set antiAffinity=hard \
     --set orchestrator.persistence.storageClass=local-path \
     > cluster01/mysql-operator/mysql-operator.yaml

This works without problems. Now we want to scale the deployment up by using the following template:

helm template presslabs presslabs/mysql-operator -n mysql-operator \
     --version 0.4.0 \
     --include-crds \
     --set antiAffinity=hard \
     --set orchestrator.persistence.storageClass=local-path \
     --set orchestrator.topologyPassword=<REDACTED>\
     --set replicas=3\
     > cluster01/mysql-operator/mysql-operator.yaml

Problem

After applying the new template the operator is replicated as expected. But the pods don't get ready anymore:

$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
presslabs-mysql-operator-0   2/2     Running   0          7d
presslabs-mysql-operator-1   1/2     Running   0          21m
presslabs-mysql-operator-2   1/2     Running   0          21m

By taking a look at the logs I found that this is a problem in the orchestrator container: Output of $ kubectl logs presslabs-mysql-operator-1 -c orchestrator is attached. log.txt

SF2311 avatar Jun 18 '21 19:06 SF2311

Is it possible to report to the orchestrator in order to get their help?

cndoit18 avatar Aug 11 '21 16:08 cndoit18

@SF2311 this still happens with 0.5.0?

calind avatar Oct 11 '21 10:10 calind

Is there any documentation I can refer to regarding the upgrade process from version 0.4.0 to 0.5.0?

SF2311 avatar Oct 13 '21 19:10 SF2311

https://github.com/bitpoke/mysql-operator/blob/master/docs/operator-upgrades.md hi, you can refer to v0.3.x upgrade

cndoit18 avatar Oct 14 '21 04:10 cndoit18

I reproduced the issue. Is your Kubernetes version 1.19?

link: #744

cndoit18 avatar Oct 28 '21 08:10 cndoit18

Yes we are running Kubernetes v1.19. After the upgrade to v0.5.1 of the operator the problem persists.

SF2311 avatar Nov 06 '21 13:11 SF2311

My solution was to actively delete all the MySQL operator pods after the upgrade.

cndoit18 avatar Nov 12 '21 08:11 cndoit18

Did this solve the problem in the long term? Because the first four days after the upgrade the operator worked fine, but then spontaneously the replication failed. So I'm not convinced that deleting the pods will fix this long term for me.

SF2311 avatar Nov 12 '21 16:11 SF2311