mysql-operator icon indicating copy to clipboard operation
mysql-operator copied to clipboard

New cluster doesn't happen to end up in orchestrator

Open ynnt opened this issue 3 years ago • 10 comments

Cluster is stuck in Ready: False phase because mysql pod never gets Ready.

2021-04-07T14:52:54.553072021Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"Lease","namespace":"default","name":"mysql-operator-leader-election","uid":"c61dd4bc-4706-4b5e-9c46-b2267447f087","apiVersion":"coordination.k8s.io/v1","resourceVersion":"78295"}, "reason": "LeaderElection", "message": "cm-mysql-operator-0_750f6e52-877d-473c-9935-69d30acfd952 became leader"}
2021-04-07T14:52:54.553275205Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.553360761Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.553824764Z	INFO	controller-runtime.manager.controller.mysqlbackupcron-controller	Starting EventSource	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554015469Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554121207Z	INFO	controller-runtime.manager.controller.mysql-database	Starting EventSource	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554335196Z	INFO	controller-runtime.manager.controller.mysql-user	Starting EventSource	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554562241Z	INFO	controller-runtime.manager.controller.controller.mysqlNode	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.554786339Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.654175743Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting Controller
2021-04-07T14:52:54.65433261Z	INFO	controller-runtime.manager.controller.mysql-database	Starting Controller
2021-04-07T14:52:54.654463104Z	INFO	controller-runtime.manager.controller.mysqlbackupcron-controller	Starting Controller
2021-04-07T14:52:54.654492149Z	INFO	controller-runtime.manager.controller.mysqlbackupcron-controller	Starting workers	{"worker count": 1}
2021-04-07T14:52:54.654561787Z	INFO	controller-runtime.manager.controller.mysql-user	Starting Controller
2021-04-07T14:52:54.654585036Z	INFO	controller-runtime.manager.controller.mysql-user	Starting workers	{"worker count": 1}
2021-04-07T14:52:54.654638173Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.654938043Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting EventSource  	{"source": "channel source: 0xc00022c2d0"}
2021-04-07T14:52:54.655058696Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting Controller
2021-04-07T14:52:54.655927767Z	INFO	controller-runtime.manager.controller.controller.mysqlNode	Starting Controller
2021-04-07T14:52:54.656018901Z	DEBUG	controller.orchestrator	register cluster in clusters list	{"obj": {"kind":"MysqlCluster","apiVersion":"mysql.presslabs.org/v1alpha1","metadata":{"name":"kl-my","namespace":"default","uid":"5a09c95d-a977-4a4b-94e3-a97209938043","resourceVersion":"74547","generation":1,"creationTimestamp":"2021-04-07T14:38:02Z","annotations":{"mysql.presslabs.org/version":"300"},"ownerReferences":[{"apiVersion":"kuberlogic.com/v1","kind":"KuberLogicService","name":"kl-my","uid":"9db08315-5aed-4f29-8c18-aa3e95ceb053","controller":true,"blockOwnerDeletion":true}],"finalizers":["mysql.presslabs.org/registered-in-orchestrator"],"managedFields":[{"manager":"operator","operation":"Update","apiVersion":"mysql.presslabs.org/v1alpha1","time":"2021-04-07T14:38:02Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{}},"f:spec":{".":{},"f:image":{},"f:podSpec":{".":{},"f:annotations":{".":{},"f:monitoring.cloudlinux.com/port":{},"f:monitoring.cloudlinux.com/scrape":{}},"f:containers":{},"f:imagePullSecrets":{},"f:initContainers":{},"f:metricsExporterResources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:mysqlOperatorSidecarResources":{".":{},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}}},"f:replicas":{},"f:secretName":{},"f:volumeSpec":{".":{},"f:persistentVolumeClaim":{".":{},"f:resources":{".":{},"f:requests":{".":{},"f:storage":{}}}}}}}},{"manager":"mysql-operator","operation":"Update","apiVersion":"mysql.presslabs.org/v1alpha1","time":"2021-04-07T14:38:06Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:mysql.presslabs.org/version":{}},"f:finalizers":{}},"f:spec":{"f:minAvailable":{},"f:podSpec":{"f:mysqlOperatorSidecarResources":{"f:limits":{".":{},"f:cpu":{},"f:memory":{}}}},"f:volumeSpec":{"f:persistentVolumeClaim":{"f:accessModes":{}}}},"f:status":{".":{},"f:conditions":{}}}}]},"spec":{"replicas":2,"secretName":"kl-my-cred","image":"quay.io/kuberlogic/mysql:5.7.26","podSpec":{"imagePullSecrets":[{"name":"kuberlogic-registry"}],"annotations":{"monitoring.cloudlinux.com/port":"9999","monitoring.cloudlinux.com/scrape":"true"},"resources":{"limits":{"cpu":"100m","memory":"512Mi"},"requests":{"cpu":"10m","memory":"256Mi"}},"initContainers":[{"name":"myisam-repair","image":"quay.io/kuberlogic/mysql:5.7.26","command":["/bin/sh","-c","for f in $(ls /var/lib/mysql/mysql/*MYI); do myisamchk -r --update-state $(echo $f | tr -d .MYI); done"],"resources":{},"volumeMounts":[{"name":"data","mountPath":"/var/lib/mysql"}]}],"containers":[{"name":"kuberlogic-exporter","image":"quay.io/kuberlogic/mysql-exporter-deprecated:v2","ports":[{"name":"metrics","containerPort":9999,"protocol":"TCP"}],"resources":{},"volumeMounts":[{"name":"data","mountPath":"/var/lib/mysql"}]}],"metricsExporterResources":{"limits":{"cpu":"100m","memory":"128Mi"},"requests":{"cpu":"10m","memory":"32Mi"}},"mysqlOperatorSidecarResources":{"requests":{"cpu":"10m","memory":"64Mi"}}},"volumeSpec":{"persistentVolumeClaim":{"resources":{"requests":{"storage":"1Gi"}}}}},"status":{"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"ClusterReadOnlyTrue","message":"read-only nodes: "},{"type":"Ready","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"StatefulSetNotReady","message":"StatefulSet is not ready"},{"type":"PendingFailoverAck","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"NoPendingFailoverAckExists","message":"no pending ack"}]}}}
2021-04-07T14:52:54.65837839Z	INFO	controller-runtime.manager.controller.mysql-database	Starting workers	{"worker count": 1}
2021-04-07T14:52:54.755666407Z	INFO	controller-runtime.manager.controller.mysqlbackup-controller	Starting workers      	{"worker count": 1}
2021-04-07T14:52:54.755720705Z	INFO	controller-runtime.manager.controller.controller.orchestrator	Starting workers      	{"worker count": 10}
2021-04-07T14:52:54.757668027Z	INFO	controller-runtime.manager.controller.controller.mysqlNode	Starting workers      	{"worker count": 1}
2021-04-07T14:52:54.757705105Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.858307211Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:54.959827321Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:55.060318519Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting EventSource  	{"source": "kind source: /, Kind="}
2021-04-07T14:52:55.161471864Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting Controller
2021-04-07T14:52:55.161604803Z	INFO	controller-runtime.manager.controller.controller.mysqlcluster	Starting workers      	{"worker count": 1}
2021-04-07T14:52:55.161879663Z	DEBUG	controller.mysqlcluster	reconcile cluster	{"key": "default/kl-my"}
2021-04-07T14:52:55.163217074Z	DEBUG	unchanged	{"syncer": "ConfigMap", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=ConfigMap", "diff": []}
2021-04-07T14:52:55.163743132Z	DEBUG	unchanged	{"syncer": "OperatedSecret", "key": {"namespace": "default", "name": "kl-my-mysql-operated"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.164085532Z	DEBUG	unchanged	{"syncer": "Secret", "key": {"namespace": "default", "name": "kl-my-cred"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.16461333Z	DEBUG	unchanged	{"syncer": "HeadlessSVC", "key": {"namespace": "default", "name": "mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.166243362Z	DEBUG	unchanged	{"syncer": "MasterSVC", "key": {"namespace": "default", "name": "kl-my-mysql-master"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.1668702Z	DEBUG	unchanged	{"syncer": "HealthySVC", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.167596425Z	DEBUG	unchanged	{"syncer": "HealthyReplicasSVC", "key": {"namespace": "default", "name": "kl-my-mysql-replicas"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.208066905Z	DEBUG	updated	{"syncer": "StatefulSet", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "apps/v1, Kind=StatefulSet", "diff": []}
2021-04-07T14:52:55.20854835Z	DEBUG	unchanged	{"syncer": "PDB", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "policy/v1beta1, Kind=PodDisruptionBudget", "diff": []}
2021-04-07T14:52:55.2085749Z	DEBUG	controller.mysqlcluster	cluster status	{"key": "default/kl-my", "status": {"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"ClusterReadOnlyTrue","message":"read-only nodes: "},{"type":"Ready","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"StatefulSetNotReady","message":"StatefulSet is not ready"},{"type":"PendingFailoverAck","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"NoPendingFailoverAckExists","message":"no pending ack"}]}}
2021-04-07T14:52:55.208888335Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"MysqlCluster","namespace":"default","name":"kl-my","uid":"5a09c95d-a977-4a4b-94e3-a97209938043","apiVersion":"mysql.presslabs.org/v1alpha1","resourceVersion":"74547"}, "reason": "StatefulSetSyncSuccessfull", "message": "apps/v1, Kind=StatefulSet default/kl-my-mysql updated successfully"}
2021-04-07T14:52:55.310250803Z	DEBUG	controller.mysqlcluster	reconcile cluster	{"key": "default/kl-my"}
2021-04-07T14:52:55.311133499Z	DEBUG	unchanged	{"syncer": "ConfigMap", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=ConfigMap", "diff": []}
2021-04-07T14:52:55.311607029Z	DEBUG	unchanged	{"syncer": "OperatedSecret", "key": {"namespace": "default", "name": "kl-my-mysql-operated"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.311900251Z	DEBUG	unchanged	{"syncer": "Secret", "key": {"namespace": "default", "name": "kl-my-cred"}, "kind": "/v1, Kind=Secret", "diff": []}
2021-04-07T14:52:55.312375685Z	DEBUG	unchanged	{"syncer": "HeadlessSVC", "key": {"namespace": "default", "name": "mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.313117698Z	DEBUG	unchanged	{"syncer": "MasterSVC", "key": {"namespace": "default", "name": "kl-my-mysql-master"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.313745106Z	DEBUG	unchanged	{"syncer": "HealthySVC", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.314438017Z	DEBUG	unchanged	{"syncer": "HealthyReplicasSVC", "key": {"namespace": "default", "name": "kl-my-mysql-replicas"}, "kind": "/v1, Kind=Service", "diff": []}
2021-04-07T14:52:55.34431649Z	DEBUG	updated	{"syncer": "StatefulSet", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "apps/v1, Kind=StatefulSet", "diff": []}
2021-04-07T14:52:55.344757091Z	DEBUG	unchanged	{"syncer": "PDB", "key": {"namespace": "default", "name": "kl-my-mysql"}, "kind": "policy/v1beta1, Kind=PodDisruptionBudget", "diff": []}
2021-04-07T14:52:55.344790801Z	DEBUG	controller.mysqlcluster	cluster status	{"key": "default/kl-my", "status": {"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"ClusterReadOnlyTrue","message":"read-only nodes: "},{"type":"Ready","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"StatefulSetNotReady","message":"StatefulSet is not ready"},{"type":"PendingFailoverAck","status":"False","lastTransitionTime":"2021-04-07T14:38:06Z","reason":"NoPendingFailoverAckExists","message":"no pending ack"}]}}
2021-04-07T14:52:55.345053401Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"MysqlCluster","namespace":"default","name":"kl-my","uid":"5a09c95d-a977-4a4b-94e3-a97209938043","apiVersion":"mysql.presslabs.org/v1alpha1","resourceVersion":"74547"}, "reason": "StatefulSetSyncSuccessfull", "message": "apps/v1, Kind=StatefulSet default/kl-my-mysql updated successfully"}
2021-04-07T14:52:59.553012175Z	DEBUG	controller.orchestrator	Schedule new cluster for reconciliation	{"key": "default/kl-my"}
2021-04-07T14:52:59.553225885Z	DEBUG	controller.orchestrator	reconciling cluster	{"key": "default/kl-my"}
2021-04-07T14:52:59.554547195Z	DEBUG	unchanged	{"syncer": "OrchestratorFinalizerSyncer", "key": {"namespace": "default", "name": "kl-my"}, "kind": "mysql.presslabs.org/v1alpha1, Kind=MysqlCluster", "diff": []}
2021-04-07T14:52:59.56895656Z	WARNING	orchestrator-reconciler	cluster not found in Orchestrator	{"key": "default/kl-my", "error": "not found"}
github.com/go-logr/zapr.(*zapLogger).Info
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:126
github.com/presslabs/mysql-operator/pkg/controller/orchestrator.(*orcUpdater).getFromOrchestrator
	/go/src/github.com/presslabs/mysql-operator/pkg/controller/orchestrator/orchestrator_reconcile.go:133
github.com/presslabs/mysql-operator/pkg/controller/orchestrator.(*orcUpdater).Sync
	/go/src/github.com/presslabs/mysql-operator/pkg/controller/orchestrator/orchestrator_reconcile.go:83
github.com/presslabs/controller-util/syncer.Sync
	/go/pkg/mod/github.com/presslabs/[email protected]/syncer/syncer.go:82
github.com/presslabs/mysql-operator/pkg/controller/orchestrator.(*ReconcileMysqlCluster).Reconcile
	/go/src/github.com/presslabs/mysql-operator/pkg/controller/orchestrator/orchestrator_controller.go:216
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99```

ynnt avatar Apr 07 '21 15:04 ynnt

Same problem

jicki avatar Apr 09 '21 03:04 jicki

Yes and logs in orchestrator show - Unable to determine cluster name

This is for a brand new cluster

nigh8w0lf avatar Apr 29 '21 21:04 nigh8w0lf

deployed the cluster in a different namespace and also tried in same namespace as operator, same result. Also tried changing the name of the cluster but has same issues as above.

nigh8w0lf avatar Apr 29 '21 21:04 nigh8w0lf

I have a similar issue: the cluster starts, but after some time the mysql becomes non-ready and I get the above log message in the operator logs.

sagikazarmark avatar Jun 23 '21 16:06 sagikazarmark

I have this problem too

browol avatar Jul 23 '21 02:07 browol

Same problem

iefc avatar Sep 29 '21 08:09 iefc

Please make sure you are not hitting #170. (see https://www.bitpoke.io/docs/mysql-operator/deploy-mysql-cluster/#note-1).

Also please try with v0.5.0.

calind avatar Oct 11 '21 11:10 calind

Hello. In my case:

  • I used Kubespray
  • Hoster shut down several servers
    • Got there one ETCD, one Worker
  • I tried adding new
  • Something went wrong
  • I DIDN'T NOTICE IT
  • The cluster crumbled
  • In a panic, I barely fix it
  • I DID SOMETHING WRONG

EVERYTHING WORKED, BUT there were errors in MySQL clusters only.

Obviously, I figured the problem was mysql-operator - no changes helped at all. Everything worked, but MySQL clusters gradually stopped working. Horror...

RUN Kubespray upgrade-cluster.yml

An error occurred - not deleted pod with MySQL cluster. The same error was at the very beginning when I tried to fix the cluster K8S. I ignored her then. This happened at the stage "Drain node"

fatal: [node1]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "drain", "--force", "--ignore-daemonsets", "--grace-period", "300", "--timeout", "360s", "--delete-emptydir-data", "node1"], "delta": "0:06:01.760844", "end": "2022-10-05 02:44:14.018346", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2022-10-05 02:38:12.257502", "stderr": "WARNING: ignoring DaemonSet-managed Pods: default/netchecker-agent-hostnet-xvkjz, default/netchecker-agent-w282k, *** \nerror when evicting pods/"***-mysql-0" -n "***" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.\nerror when evicting pods/"***-mysql-0" -n "***"

Kubespray unable to upgrade the cluster completely - in my case that was the reason.

Solution (in my case)

  1. RUN Kubespray upgrade-cluster.yml
  2. Follow the process to the stage "Drain node" each node
  3. The process will hang at this stage and wait for a long time
  4. Delete all MySQL pods from this node
  5. The process will move forward
  6. The K8S cluster will be updated without errors and everything will work

tebaly avatar Oct 05 '22 11:10 tebaly

hello

same problem here. 77 clusters deployed without problem but one of them does not want to deploy the second node because "cluster not found in Orchestrator". No other error at all

  • Last operator version
  • k8s v1.24.2
  • name of the cluster is just db and namespace is shorter than all others.

oau-dev avatar Feb 01 '24 16:02 oau-dev

hello, I found that some data are still in the sqlite db after days of cluster deletion. in database_instance_last_analysis , database_instance_tls, kv_store, hostname_ips.

  • Is this could prevent the cluster to be reinstalled and trigger the message "cluster not found in Orchestrator" ?
  • Is it safe to clean those ?
  • any cleaning process can be achived ?

thx for your help, I'm really stuck here :(

oau-dev avatar Feb 07 '24 18:02 oau-dev