mysql-operator
mysql-operator copied to clipboard
Cluster never gets Ready
Hello!
I'm traying to spin up a cluster but It never gets ready.
I can't identify why. Can you help me?
Some information:
kubectl get MysqlCluster -n transaction
NAME READY REPLICAS AGE
contract-api-cluster False 29m
kubectl describe MysqlCluster contract-api-cluster -n transaction
Name: contract-api-cluster
Namespace: transaction
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"mysql.presslabs.org/v1alpha1","kind":"MysqlCluster","metadata":{"annotations":{},"name":"contract-api-cluster","namespace":...
mysql.presslabs.org/version: 300
API Version: mysql.presslabs.org/v1alpha1
Kind: MysqlCluster
Metadata:
Creation Timestamp: 2020-03-26T21:13:05Z
Finalizers:
mysql.presslabs.org/registered-in-orchestrator
Generation: 2
Resource Version: 458095091
Self Link: /apis/mysql.presslabs.org/v1alpha1/namespaces/transaction/mysqlclusters/contract-api-cluster
UID: 46692ed0-7564-4e0e-ae3a-9418e3ad6f82
Spec:
Pod Spec:
Resources:
Secret Name: contract-api-db
Volume Spec:
Persistent Volume Claim:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 1Gi
Storage Class Name: us-east-1a
Status:
Conditions:
Last Transition Time: 2020-03-26T21:13:07Z
Message: cluster is in read only
Reason: ClusterReadOnlyTrue
Status: True
Type: ReadOnly
Last Transition Time: 2020-03-26T21:13:07Z
Message: Node contract-api-cluster-mysql-0.mysql.transaction is part of topology and not replicating
Reason: NotReplicating
Status: False
Type: Ready
Last Transition Time: 2020-03-26T21:13:07Z
Message: no pending ack
Reason: NoPendingFailoverAckExists
Status: False
Type: PendingFailoverAck
Nodes:
Conditions:
Last Transition Time: 2020-03-26T21:15:27Z
Status: Unknown
Type: Lagged
Last Transition Time: 2020-03-26T21:15:27Z
Status: Unknown
Type: Replicating
Last Transition Time: 2020-03-26T21:15:27Z
Status: Unknown
Type: Master
Last Transition Time: 2020-03-26T21:15:27Z
Status: Unknown
Type: ReadOnly
Name: contract-api-cluster-mysql-0.mysql.transaction
Ready Nodes: 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ConfigMapSyncSuccessfull 33m controller.mysqlcluster *v1.ConfigMap transaction/contract-api-cluster-mysql created successfully
Normal OperatedSecretSyncSuccessfull 33m controller.mysqlcluster *v1.Secret transaction/contract-api-cluster-mysql-operated created successfully
Normal MasterSVCSyncSuccessfull 33m controller.mysqlcluster *v1.Service transaction/contract-api-cluster-mysql-master created successfully
Normal HealthySVCSyncSuccessfull 33m controller.mysqlcluster *v1.Service transaction/contract-api-cluster-mysql created successfully
Normal StatefulSetSyncSuccessfull 33m controller.mysqlcluster *v1.StatefulSet transaction/contract-api-cluster-mysql created successfully
Normal StatefulSetSyncSuccessfull 31m (x13 over 33m) controller.mysqlcluster *v1.StatefulSet transaction/contract-api-cluster-mysql updated successfully
kubectl logs mysql-operator-0 -c orchestrator
2020/03/26 20:20:51 [INFO] raft: Restored from snapshot 49-178780-1585251793307
2020/03/26 20:20:51 [INFO] raft: Node at 172.20.8.4:10008 [Follower] entering Follower state (Leader: "")
[martini] Started GET /api/raft-health for 10.160.105.250:58832
[martini] Completed 200 OK in 28.29883ms
2020/03/26 20:20:53 [WARN] raft: Heartbeat timeout from "" reached, starting election
2020/03/26 20:20:53 [INFO] raft: Node at 172.20.8.4:10008 [Candidate] entering Candidate state
2020/03/26 20:20:53 [DEBUG] raft: Votes needed: 1
2020/03/26 20:20:53 [DEBUG] raft: Vote granted from 172.20.8.4:10008. Tally: 1
2020/03/26 20:20:53 [INFO] raft: Election won. Tally: 1
2020/03/26 20:20:53 [INFO] raft: Node at 172.20.8.4:10008 [Leader] entering Leader state
[martini] Started GET /api/raft-follower-health-report/d5a03335/mysql-operator-0/mysql-operator-0-svc for 10.160.105.250:46200
2020-03-26 20:20:53 ERROR Raft health report: unknown token d5a03335
[martini] Completed 500 Internal Server Error in 2.924568ms
2020-03-26 20:20:53 ERROR HttpGetLeader: got 500 status on http://mysql-operator-0-svc:80/api/raft-follower-health-report/d5a03335/mysql-operator-0/mysql-operator-0-svc
2020/03/26 20:20:53 [DEBUG] raft: Node 172.20.8.4:10008 updated peer set (2): [172.20.8.4:10008]
[martini] Started GET /api/raft-follower-health-report/7887f81f/mysql-operator-0/mysql-operator-0-svc for 10.160.105.250:46200
[martini] Completed 200 OK in 1.018483ms
[martini] Started GET /api/raft-health for 10.160.105.250:58968
[martini] Completed 200 OK in 1.525997ms
[martini] Started GET /api/raft-follower-health-report/17394bc5/mysql-operator-0/mysql-operator-0-svc for 10.160.105.250:46200
[martini] Completed 200 OK in 1.687689ms
[martini] Started GET /api/cluster/contract-api-cluster.transaction for 127.0.0.1:45908
2020-03-26 20:21:12 ERROR Unable to determine cluster name. clusterHint=contract-api-cluster.transaction
[martini] Completed 500 Internal Server Error in 6.671548ms
[martini] Started GET /api/discover/contract-api-cluster-mysql-0.mysql.transaction/3306 for 127.0.0.1:45908
2020-03-26 20:21:12 ERROR x509: certificate is valid for MySQL_Server_5.7.26-29_Auto_Generated_Server_Certificate, not contract-api-cluster-mysql-0.mysql.transaction
2020-03-26 20:21:12 ERROR ReadTopologyInstance(contract-api-cluster-mysql-0.mysql.transaction:3306) show variables like 'maxscale%': x509: certificate is valid for MySQL_Server_5.7.26-29_Auto_Generated_Server_Certificate, not contract-api-cluster-mysql-0.mysql.transaction
[martini] Completed 500 Internal Server Error in 95.750111ms
[martini] Started GET /api/audit-recovery/contract-api-cluster.transaction for 127.0.0.1:45908
[martini] Completed 200 OK in 2.705527ms
[martini] Started GET /api/raft-health for 10.160.105.250:59128
[martini] Completed 200 OK in 3.488913ms
[martini] Started GET /api/cluster/contract-api-cluster.transaction for 127.0.0.1:45908
2020-03-26 20:21:17 ERROR Unable to determine cluster name. clusterHint=contract-api-cluster.transaction
[martini] Completed 500 Internal Server Error in 6.019458ms
[martini] Started GET /api/discover/contract-api-cluster-mysql-0.mysql.transaction/3306 for 127.0.0.1:45908
2020-03-26 20:21:17 ERROR x509: certificate is valid for MySQL_Server_5.7.26-29_Auto_Generated_Server_Certificate, not contract-api-cluster-mysql-0.mysql.transaction
[martini] Completed 500 Internal Server Error in 19.872942ms
[martini] Started GET /api/audit-recovery/contract-api-cluster.transaction for 127.0.0.1:45908
[martini] Completed 200 OK in 2.83317ms
[martini] Started GET /api/raft-follower-health-report/1096224b/mysql-operator-0/mysql-operator-0-svc for 10.160.105.250:46200
[martini] Completed 200 OK in 590.396µs
[martini] Started GET /api/cluster/contract-api-cluster.transaction for 127.0.0.1:45908
2020-03-26 20:21:22 ERROR Unable to determine cluster name. clusterHint=contract-api-cluster.transaction
[martini] Completed 500 Internal Server Error in 5.708816ms
[martini] Started GET /api/discover/contract-api-cluster-mysql-0.mysql.transaction/3306 for 127.0.0.1:45908
2020-03-26 20:21:22 ERROR x509: certificate is valid for MySQL_Server_5.7.26-29_Auto_Generated_Server_Certificate, not contract-api-cluster-mysql-0.mysql.transaction
[martini] Completed 500 Internal Server Error in 18.506516ms
[martini] Started GET /api/audit-recovery/contract-api-cluster.transaction for 127.0.0.1:45908
[martini] Completed 200 OK in 1.456536ms
[martini] Started GET /api/raft-health for 10.160.105.250:59298
[martini] Completed 200 OK in 1.268885ms
[martini] Started GET /api/cluster/contract-api-cluster.transaction for 127.0.0.1:45908
2020-03-26 20:21:27 ERROR Unable to determine cluster name. clusterHint=contract-api-cluster.transaction
[martini] Completed 500 Internal Server Error in 8.430831ms
Hi @gnoguchi !
I need more details. Please share with me the logs from mysql container:
kubectl -n transaction logs contract-api-cluster-mysql-0 -c mysql
I suspect that node contract-api-cluster-mysql-0.mysql.transaction
is misconfigured, maybe the orchestrator user was not created.
Thank you for reporting this and helping to debug! Best regards!
Hello @AMecea ! Thanks for your support!
kubectl -n transaction logs contract-api-cluster-mysql-0 -c mysql
+ kubectl -n transaction logs contract-api-cluster-mysql-0 -c mysql
2020-03-27T15:34:34.300577Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-03-27T15:34:34.303072Z 0 [Note] mysqld (mysqld 5.7.26-29-log) starting as process 1 ...
2020-03-27T15:34:34.309942Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-03-27T15:34:34.309980Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-03-27T15:34:34.309989Z 0 [Note] InnoDB: Uses event mutexes
2020-03-27T15:34:34.309996Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-03-27T15:34:34.310004Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-03-27T15:34:34.310011Z 0 [Note] InnoDB: Using Linux native AIO
2020-03-27T15:34:34.310639Z 0 [Note] InnoDB: Number of pools: 1
2020-03-27T15:34:34.310813Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-03-27T15:34:34.319797Z 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
2020-03-27T15:34:34.344040Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-03-27T15:34:34.351542Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-03-27T15:34:34.374202Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2020-03-27T15:34:34.379578Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2020-03-27T15:34:34.423917Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2020-03-27T15:34:34.722985Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-03-27T15:34:34.723110Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2020-03-27T15:34:34.769249Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-03-27T15:34:34.770217Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2020-03-27T15:34:34.770228Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2020-03-27T15:34:34.771619Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.26-29 started; log sequence number 11908284
2020-03-27T15:34:34.771827Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2020-03-27T15:34:34.772126Z 0 [Note] Plugin 'FEDERATED' is disabled.
2020-03-27T15:34:34.812097Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2020-03-27T15:34:34.812121Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2020-03-27T15:34:34.818902Z 0 [Warning] CA certificate ca.pem is self signed.
2020-03-27T15:34:34.818960Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2020-03-27T15:34:34.824909Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2020-03-27T15:34:34.825054Z 0 [Note] IPv6 is available.
2020-03-27T15:34:34.825074Z 0 [Note] - '::' resolves to '::';
2020-03-27T15:34:34.825107Z 0 [Note] Server socket created on IP: '::'.
2020-03-27T15:34:34.969725Z 0 [Note] InnoDB: Buffer pool(s) load completed at 200327 15:34:34
2020-03-27T15:34:35.152793Z 0 [Note] Event Scheduler: Loaded 0 events
2020-03-27T15:34:35.152958Z 0 [Note] Execution of init_file '/etc/mysql/conf.d/operator-init.sql' started.
2020-03-27T15:34:35.172012Z 0 [Note] Execution of init_file '/etc/mysql/conf.d/operator-init.sql' ended.
2020-03-27T15:34:35.172156Z 0 [Note] mysqld: ready for connections.
Version: '5.7.26-29-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 29, Revision 11ad961
Hey guys! Any news?
Thanks!
Hi, am observing similar issue, cluster status is stuck in False. There was a reelection and the original leader is stuck in not replicating stage. Is there a way to recover from the situation without loosing data? Tried rebooting mysql pods, operator pod but didnt help. Appreciate any help/fix. Below is the dump of various logs and commands,
Using - operator version 0.3.9
kubectl describe mysql metadata-cluster-0 -n infra
Name: metadata-cluster-0 Namespace: infra Labels: active=true name=metadata-cluster Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"mysql.presslabs.org/v1alpha1","kind":"MysqlCluster","metadata":{"annotations":{},"labels":{"active":"true","name":"metadata... mysql.presslabs.org/version: 300 API Version: mysql.presslabs.org/v1alpha1 Kind: MysqlCluster Spec: Pod Spec: Resources: Replicas: 3 Secret Name: mysql-secret Volume Spec: Persistent Volume Claim: Access Modes: ReadWriteOnce Resources: Requests: Storage: 10Gi Status: Conditions: Last Transition Time: 2020-04-17T09:59:51Z Message: cluster is writable Reason: ClusterReadOnlyFalse Status: False Type: ReadOnly Last Transition Time: 2020-04-16T18:47:04Z Message: Node metadata-cluster-0-mysql-0.mysql.infra is part of topology and not replicating
Reason: NotReplicating
Status: False
Type: Ready
Last Transition Time: 2020-04-17T05:45:50Z
Message: no pending ack
Reason: NoPendingFailoverAckExists
Status: False
Type: PendingFailoverAck
Last Transition Time: 2020-04-17T05:36:30Z
Message: Master is healthy in orchestrator
Reason: ClusterMasterHealthy
Status: False
Type: FailoverInProgress
Nodes: Conditions: Last Transition Time: 2020-04-17T05:36:35Z Status: Unknown Type: Lagged Last Transition Time: 2020-04-17T05:35:45Z Status: False Type: Replicating Last Transition Time: 2020-04-17T05:35:45Z Status: False Type: Master Last Transition Time: 2020-04-16T19:13:19Z Status: True Type: ReadOnly Name: metadata-cluster-0-mysql-0.mysql.infra
Logs from MySQL Operator
{"level":"info","ts":1587116536.3045983,"logger":"controller.mysqlcluster","msg":"cluster status","status":{"readyNodes":3,"conditions":[{"type":"ReadOnly","status":"True","lastTransitionTime":"2020-04-17T09:42:16Z","reason":"ClusterReadOnlyTrue","message":"cluster is in read only"},{"type":"Ready","status":"False","lastTransitionTime":"2020-04-16T18:47:04Z","reason":"NotReplicating","message":"Node metadata-cluster-0-mysql-0.mysql.infra is part of topology and not replicating"},
Logs from Leader pod
2020-04-17T05:35:58.208160Z 42 [ERROR] Slave I/O for channel '': error connecting to master '[email protected]:3306' - retry-time: 10 retries: 1, Error_code: 2005 2020-04-17T05:36:08.610914Z 42 [ERROR] Slave I/O for channel '': error connecting to master '[email protected]:3306' - retry-time: 10 retries: 2, Error_code: 2005 2020-04-17T05:36:19.013254Z 42 [ERROR] Slave I/O for channel '': error connecting to master '[email protected]:3306' - retry-time: 10 retries: 3, Error_code: 2005 2020-04-17T05:36:29.348391Z 42 [Note] Slave I/O thread for channel '': connected to master '[email protected]:3306',replication started in log 'FIRST' at position 4 2020-04-17T05:36:30.046037Z 43 [ERROR] Slave SQL for channel '': Error executing row event: 'Table 'sys_operator.heartbeat' doesn't exist', Error_code: 1146 2020-04-17T05:36:30.046060Z 43 [Warning] Slave: Table 'sys_operator.heartbeat' doesn't exist Error_code: 1146 2020-04-17T05:36:30.046065Z 43 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 154.
MySQL dump - Heartbeat table is missing.
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | | sys_operator | +--------------------+ 5 rows in set (0.01 sec)
mysql> use sys_operator Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A
Database changed mysql> show tables; +------------------------+ | Tables_in_sys_operator | +------------------------+ | status | +------------------------+ 1 row in set (0.00 sec)
mysql>
[martini] Started GET /api/discover/contract-api-cluster-mysql-0.mysql.transaction/3306 for 127.0.0.1:45908
2020-03-26 20:21:12 ERROR x509: certificate is valid for MySQL_Server_5.7.26-29_Auto_Generated_Server_Certificate, not contract-api-cluster-mysql-0.mysql.transaction
do you notice this error in the log? and in the mysql err-log:
[Note] Bad handshake
Hi, Can you try with the freshly released v0.5.0?
It was quite scary because I thought my clusterbackup was corrupt, but it was not the case. In my case the first node has everything ready except the mysql pod which has ready false, despite container ready is true.
I scaled my cluster statefulset to zero, which then triggered the cluster to terminate the first node and recreate it with the existing PVC and other resources made from the previous process.
The second time it did start the first and second node without any problem.