orchestrator
orchestrator copied to clipboard
orchestrator failure : topology were screwed up with graceful-master-takeover, the CPU core surged to 100%.
The replication topology was messed up by one of the production graceful-master-takeover. The intended primary CPU core spiked to 100% use during the failover, the orchestrator process stalled for a few seconds and it failed with an unexpected fault, leaving the demoted failover candidate node in an inconsistent state to begin replication.
Monitor~ $ sudo orchestrator-client -c graceful-master-takeover -a PROD -s <IP of the node>:3306
parse error: Invalid numeric literal at line 1, column 7
parse error: Invalid numeric literal at line 1, column 7
^C
Monitor~ $
CPU details
Processors | physical = 96, cores = 96, virtual = 96, hyperthreading = no
Speeds | 96x2394.375
Models | 96xIntel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
Caches | 96x36608 KB
CPU usage details :

Reproduced the similar failure case:
Tried to mimic that by intentionally applying load to the intended primary with the following query.
Note: In the production case, there were no queries inside that were causing the CPU core usage to 100%
Intended primary node:
db2: # for i in {1..1000000}; do mysql -e "select * from mydb.t1;" & done;
Test case Topology:
[root@app-node ~]# orchestrator-client -c topology -a db1-node:3306
db1-node:3306 (db1-node) [0s,ok,5.7.34-37-log,rw,ROW,>>,GTID,P-GTID]
+ db2-node:3306 (db2-node) [0s,ok,5.7.33-36-log,ro,ROW,>>,GTID,P-GTID]
First graceful-master-takeover : Failed with the following error.
[root@app-node ~]# orchestrator-client -c graceful-master-takeover -a db1-node:3306 -d db2-node:3306
dial tcp 192.168.33.11:3306: connect: connection refused
[root@app-node ~]#
Topology is the same :
[root@app-node ~]# orchestrator-client -c topology -a db1-node:3306
db1-node:3306 (db1-node) [0s,ok,5.7.34-37-log,ro,ROW,>>,GTID,P-GTID]
+ db2-node:3306 (db2-node) [0s,ok,5.7.33-36-log,ro,ROW,>>,GTID,P-GTID]
[root@app-node ~]#
The second attempt completed the failover but the replication topology is messed up.
[root@app-node ~]# orchestrator-client -c graceful-master-takeover -a db1-node:3306 -d db2-node:3306
Db2-node:3306
Topology seeing messed up.
[root@app-node ~]# orchestrator-client -c topology -a db1-node:3306
- db1-node:3306 (db1-node) [null,nonreplicating,5.7.34-37-log,ro,ROW,>>,GTID,P-GTID,downtimed]
+ db2-node:3306 (db2-node) [0s,ok,5.7.33-36-log,ro,ROW,>>,GTID,P-GTID]
[root@app-node ~]#
Orchestrator log:
Jun 29 07:20:27 app-node orchestrator: [martini] Started HEAD / for [::1]:48098
Jun 29 07:20:27 app-node orchestrator: [martini] Completed 302 Found in 451.882µs
Jun 29 07:20:27 app-node orchestrator: [martini] Started HEAD / for [::1]:48100
Jun 29 07:20:27 app-node orchestrator: [martini] Completed 302 Found in 194.216µs
Jun 29 07:20:27 app-node orchestrator: [martini] Started GET /api/topology/monitor-node:3306 for [::1]:48102
Jun 29 07:20:27 app-node orchestrator: 2022-06-29 07:20:27 DEBUG instanceKey: db1-node:3306
Jun 29 07:20:27 app-node orchestrator: 2022-06-29 07:20:27 DEBUG instanceKey: db2-node:3306
Jun 29 07:20:27 app-node orchestrator: 2022-06-29 07:20:27 DEBUG instanceKey: monitor-node:3306
Jun 29 07:20:27 app-node orchestrator: [martini] Completed 200 OK in 3.575134ms
Jun 29 07:20:28 app-node orchestrator: 2022-06-29 07:20:28 DEBUG analysis: ClusterName: db1-node:3306, IsMaster: false, LastCheckValid: false, LastCheckPartialSuccess: false, CountReplicas: 0, CountValidReplicas: 0, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0
Jun 29 07:20:37 app-node orchestrator: [martini] Started HEAD / for [::1]:48104
Jun 29 07:20:37 app-node orchestrator: [martini] Completed 302 Found in 224.348µs
Jun 29 07:20:40 app-node orchestrator: [martini] Started HEAD / for [::1]:48114
Jun 29 07:20:40 app-node orchestrator: [martini] Completed 302 Found in 884.981µs
Jun 29 07:20:40 app-node orchestrator: [martini] Started HEAD / for [::1]:48116
Jun 29 07:20:40 app-node orchestrator: [martini] Completed 302 Found in 415.207µs
Jun 29 07:20:40 app-node orchestrator: [martini] Started GET /api/forget/monitor-node/3306 for [::1]:48118
Jun 29 07:20:40 app-node orchestrator: 2022-06-29 07:20:40 INFO auditType:forget instance:monitor-node:3306 cluster: message:
Jun 29 07:20:40 app-node orchestrator: [martini] Completed 200 OK in 17.318975ms
Jun 29 07:20:48 app-node orchestrator: [martini] Started HEAD / for [::1]:48120
Jun 29 07:20:48 app-node orchestrator: [martini] Completed 302 Found in 236.147µs
Jun 29 07:20:48 app-node orchestrator: [martini] Started HEAD / for [::1]:48122
Jun 29 07:20:48 app-node orchestrator: [martini] Completed 302 Found in 264.455µs
Jun 29 07:20:48 app-node orchestrator: [martini] Started GET /api/topology/db1-node:3306 for [::1]:48124
Jun 29 07:20:48 app-node orchestrator: 2022-06-29 07:20:48 DEBUG instanceKey: db1-node:3306
Jun 29 07:20:48 app-node orchestrator: 2022-06-29 07:20:48 DEBUG instanceKey: db2-node:3306
Jun 29 07:20:48 app-node orchestrator: [martini] Completed 200 OK in 2.219462ms
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 DEBUG kv.SubmitMastersToKvStores, clusterName: , force: false: numPairs: 5
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 DEBUG kv.SubmitMastersToKvStores: submitKvPairs: 0
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:21:11 app-node orchestrator: 2022-06-29 07:21:11 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:21:35 app-node orchestrator: 2022-06-29 07:21:35 DEBUG analysis: ClusterName: db1-node:3306, IsMaster: false, LastCheckValid: false, LastCheckPartialSuccess: true, CountReplicas: 0, CountValidReplicas: 0, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0
Jun 29 07:21:42 app-node orchestrator: [martini] Started HEAD / for [::1]:48144
Jun 29 07:21:42 app-node orchestrator: [martini] Completed 302 Found in 318.193µs
Jun 29 07:21:42 app-node orchestrator: [martini] Started HEAD / for [::1]:48146
Jun 29 07:21:42 app-node orchestrator: [martini] Completed 302 Found in 191.685µs
Jun 29 07:21:42 app-node orchestrator: [martini] Started GET /api/graceful-master-takeover/db1-node:3306/db2-node/3306 for [::1]:48148
Jun 29 07:21:42 app-node orchestrator: 2022-06-29 07:21:42 INFO GracefulMasterTakeover: designated master instructed to be db2-node:3306
Jun 29 07:21:42 app-node orchestrator: 2022-06-29 07:21:42 INFO GracefulMasterTakeover: Will demote db1-node:3306 and promote db2-node:3306 instead
Jun 29 07:21:42 app-node orchestrator: [mysql] 2022/06/29 07:21:42 packets.go:123: closing bad idle connection: EOF
Jun 29 07:21:43 app-node orchestrator: [mysql] 2022/06/29 07:21:43 packets.go:37: read tcp 192.168.33.12:37326->192.168.33.11:3306: i/o timeout
Jun 29 07:21:43 app-node orchestrator: 2022-06-29 07:21:43 ERROR ReadTopologyInstance(db2-node:3306) show global status like 'Uptime': invalid connection
Jun 29 07:21:47 app-node orchestrator: [mysql] 2022/06/29 07:21:47 packets.go:37: read tcp 192.168.33.12:37328->192.168.33.11:3306: i/o timeout
Jun 29 07:21:47 app-node orchestrator: 2022-06-29 07:21:47 ERROR invalid connection
Jun 29 07:21:52 app-node orchestrator: [mysql] 2022/06/29 07:21:52 packets.go:37: read tcp 192.168.33.12:37324->192.168.33.11:3306: i/o timeout
Jun 29 07:21:52 app-node orchestrator: 2022-06-29 07:21:52 ERROR invalid connection
Jun 29 07:22:00 app-node orchestrator: [mysql] 2022/06/29 07:22:00 packets.go:37: unexpected EOF
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 ERROR invalid connection
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO topology_recovery: Running 1 PreGracefulTakeoverProcesses hooks
Jun 29 07:22:00 app-node orchestrator: [mysql] 2022/06/29 07:22:00 packets.go:37: unexpected EOF
Jun 29 07:22:00 app-node orchestrator: [mysql] 2022/06/29 07:22:00 packets.go:37: unexpected EOF
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 ERROR invalid connection
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO topology_recovery: Running PreGracefulTakeoverProcesses hook 1 of 1: echo 'Planned takeover about to take place on db1-node:3306. Master will switch to read_only' >> /tmp/recovery.log
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 WARNING DiscoverInstance(db2-node:3306) instance is nil in 27.054s (Backend: 0.006s, Instance: 27.048s), error=invalid connection
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 WARNING discoverInstance exceeded InstancePollSeconds for db2-node:3306, took 27.0537s
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO CommandRun(echo 'Planned takeover about to take place on db1-node:3306. Master will switch to read_only' >> /tmp/recovery.log,[])
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-969330023
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO CommandRun:
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO CommandRun successful. exit status 0
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO topology_recovery: Completed PreGracefulTakeoverProcesses hook 1 of 1 in 2.605573ms
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO topology_recovery: done running PreGracefulTakeoverProcesses hooks
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO GracefulMasterTakeover: Will set db1-node:3306 as read_only
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO instance db1-node:3306 read_only: true
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO auditType:read-only instance:db1-node:3306 cluster:db1-node:3306 message:set as true
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 INFO GracefulMasterTakeover: Will wait for db2-node:3306 to reach master coordinates mysql-bin.000002:72008
Jun 29 07:22:00 app-node orchestrator: 2022-06-29 07:22:00 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:22:00 app-node orchestrator: [martini] Completed 500 Internal Server Error in 18.365893955s
Jun 29 07:22:01 app-node orchestrator: 2022-06-29 07:22:01 DEBUG analysis: ClusterName: db1-node:3306, IsMaster: false, LastCheckValid: false, LastCheckPartialSuccess: false, CountReplicas: 0, CountValidReplicas: 0, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0
Jun 29 07:22:01 app-node orchestrator: 2022-06-29 07:22:01 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: MasterSingleReplicaDead; key: db1-node:3306
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 DEBUG kv.SubmitMastersToKvStores, clusterName: , force: false: numPairs: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 DEBUG kv.SubmitMastersToKvStores: submitKvPairs: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
Jun 29 07:22:11 app-node orchestrator: 2022-06-29 07:22:11 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
Jun 29 07:22:31 app-node orchestrator: [martini] Started HEAD / for [::1]:48210
Jun 29 07:22:31 app-node orchestrator: [martini] Completed 302 Found in 245.07µs
Jun 29 07:22:31 app-node orchestrator: [martini] Started HEAD / for [::1]:48212
Jun 29 07:22:31 app-node orchestrator: [martini] Completed 302 Found in 451.492µs
Jun 29 07:22:31 app-node orchestrator: [martini] Started GET /api/topology/db1-node:3306 for [::1]:48214
Jun 29 07:22:31 app-node orchestrator: 2022-06-29 07:22:31 DEBUG instanceKey: db1-node:3306
Jun 29 07:22:31 app-node orchestrator: 2022-06-29 07:22:31 DEBUG instanceKey: db2-node:3306
Jun 29 07:22:31 app-node orchestrator: [martini] Completed 200 OK in 1.397641ms
Jun 29 07:22:58 app-node orchestrator: [martini] Started HEAD / for [::1]:48218
Jun 29 07:22:58 app-node orchestrator: [martini] Completed 302 Found in 197.256µs
Jun 29 07:22:58 app-node orchestrator: [martini] Started HEAD / for [::1]:48220
Jun 29 07:22:58 app-node orchestrator: [martini] Completed 302 Found in 203.594µs
Jun 29 07:22:58 app-node orchestrator: [martini] Started GET /api/graceful-master-takeover/db1-node:3306/db2-node/3306 for [::1]:48222
Jun 29 07:22:58 app-node orchestrator: 2022-06-29 07:22:58 INFO GracefulMasterTakeover: designated master instructed to be db2-node:3306
Jun 29 07:22:58 app-node orchestrator: 2022-06-29 07:22:58 INFO GracefulMasterTakeover: Will demote db1-node:3306 and promote db2-node:3306 instead
Jun 29 07:23:02 app-node orchestrator: 2022-06-29 07:23:02 DEBUG analysis: ClusterName: db1-node:3306, IsMaster: false, LastCheckValid: false, LastCheckPartialSuccess: true, CountReplicas: 0, CountValidReplicas: 0, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO topology_recovery: Running 1 PreGracefulTakeoverProcesses hooks
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO topology_recovery: Running PreGracefulTakeoverProcesses hook 1 of 1: echo 'Planned takeover about to take place on db1-node:3306. Master will switch to read_only' >> /tmp/recovery.log
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO CommandRun(echo 'Planned takeover about to take place on db1-node:3306. Master will switch to read_only' >> /tmp/recovery.log,[])
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-260253082
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO CommandRun:
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO CommandRun successful. exit status 0
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO topology_recovery: Completed PreGracefulTakeoverProcesses hook 1 of 1 in 2.653304ms
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO topology_recovery: done running PreGracefulTakeoverProcesses hooks
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO GracefulMasterTakeover: Will set db1-node:3306 as read_only
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO instance db1-node:3306 read_only: true
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO auditType:read-only instance:db1-node:3306 cluster:db1-node:3306 message:set as true
Jun 29 07:23:05 app-node orchestrator: 2022-06-29 07:23:05 INFO GracefulMasterTakeover: Will wait for db2-node:3306 to reach master coordinates mysql-bin.000002:72008
Jun 29 07:23:10 app-node orchestrator: [mysql] 2022/06/29 07:23:10 packets.go:37: read tcp 192.168.33.12:37448->192.168.33.11:3306: i/o timeout
Jun 29 07:23:10 app-node orchestrator: 2022-06-29 07:23:10 ERROR ReadTopologyInstance(db2-node:3306) show global status like 'Uptime': invalid connection
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 DEBUG kv.SubmitMastersToKvStores, clusterName: , force: false: numPairs: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 DEBUG kv.SubmitMastersToKvStores: submitKvPairs: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:23:11 app-node orchestrator: 2022-06-29 07:23:11 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:23:19 app-node orchestrator: [mysql] 2022/06/29 07:23:19 packets.go:37: read tcp 192.168.33.12:37450->192.168.33.11:3306: i/o timeout
Jun 29 07:23:19 app-node orchestrator: 2022-06-29 07:23:19 ERROR invalid connection
Jun 29 07:23:24 app-node orchestrator: [mysql] 2022/06/29 07:23:24 packets.go:37: unexpected EOF
Jun 29 07:23:24 app-node orchestrator: [mysql] 2022/06/29 07:23:24 packets.go:37: unexpected EOF
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR invalid connection
Jun 29 07:23:24 app-node orchestrator: [mysql] 2022/06/29 07:23:24 packets.go:37: unexpected EOF
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG writeInstance: will not update database_instance due to error: invalid connection
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO GracefulMasterTakeover: attempting recovery
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster detection on db1-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: detected DeadMaster failure on db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running 1 OnFailureDetectionProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running OnFailureDetectionProcesses hook 1 of 1: echo 'Detected DeadMaster on db1-node:3306. Affected replicas: 1' >> /tmp/recovery.log
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG writeInstance: will not update database_instance due to error: invalid connection
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 WARNING discoverInstance exceeded InstancePollSeconds for db2-node:3306, took 23.4004s
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun(echo 'Detected DeadMaster on db1-node:3306. Affected replicas: 1' >> /tmp/recovery.log,[])
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-618311729
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun:
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun successful. exit status 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Completed OnFailureDetectionProcesses hook 1 of 1 in 3.181079ms
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: done running OnFailureDetectionProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster recovery on db1-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: will handle DeadMaster event on db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO auditType:recover-dead-master instance:db1-node:3306 cluster:db1-node:3306 message:problem found; will recover
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running 1 PreFailoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running PreFailoverProcesses hook 1 of 1: /usr/local/bin/prefailover.sh
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun(/usr/local/bin/prefailover.sh,[])
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-755912668
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun: /usr/local/bin/prefailover.sh: line 11: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/prefailover.sh: line 16: [: too many arguments
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun successful. exit status 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Completed PreFailoverProcesses hook 1 of 1 in 7.195828ms
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: done running PreFailoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: RecoverDeadMaster: will recover db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: RecoverDeadMaster: masterRecoveryType=MasterRecoveryPseudoGTID
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: RecoverDeadMaster: regrouping replicas via Pseudo-GTID
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG RegroupReplicasIncludingSubReplicasOfBinlogServers: starting on replicas of db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG RegroupReplicasIncludingSubReplicasOfBinlogServers: no binlog server replicates from db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG GetCandidateReplica: candidate: db2-node:3306, ahead: 0, equal: 0, late: 0, break: 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: RecoverDeadMaster: promotedReplicaIsIdeal(db2-node:3306)
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG RegroupReplicas: done
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG RegroupReplicas: working on 0 equals replicas
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG RegroupReplicas: multi matching 0 later replicas
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG RegroupReplicas: starting 1 replicas
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO auditType:begin-downtime instance:db1-node:3306 cluster:db1-node:3306 message:owner: app-node, reason: lost-in-recovery
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: RecoverDeadMaster: 1 postponed functions
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO auditType:regroup-replicas instance:db1-node:3306 cluster:db1-node:3306 message:regrouped 1 replicas below db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: checking if should replace promoted replica with a better candidate
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: + sanity check: found our very own server to promote; doing nothing
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: replace-promoted-replica-with-candidate: promoted instance db2-node:3306 requires no further action
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: promoted replica: db2-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO auditType:recover-dead-master instance:db1-node:3306 cluster:db1-node:3306 message:promoted replica: db2-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: RecoverDeadMaster: successfully promoted db2-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: promoted server coordinates: mysql-bin.000016:234
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: will apply MySQL changes to promoted master
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: applying RESET SLAVE ALL on promoted master: success=false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: NOTE that db2-node:3306 is promoted even though SHOW SLAVE STATUS may still show it has a master
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR dial tcp 192.168.33.11:3306: connect: connection refused
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: applying read-only=0 on promoted master: success=false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Writing KV [mysql/master/enterprise:db2-node:3306 mysql/master/enterprise/hostname:db2-node mysql/master/enterprise/port:3306 mysql/master/enterprise/ipv4:192.168.33.11 mysql/master/enterprise/ipv6:]
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Distributing KV [mysql/master/enterprise:db2-node:3306 mysql/master/enterprise/hostname:db2-node mysql/master/enterprise/port:3306 mysql/master/enterprise/ipv4:192.168.33.11 mysql/master/enterprise/ipv6:]
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: updating cluster_alias: db1-node:3306 -> db2-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running 1 PostMasterFailoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running PostMasterFailoverProcesses hook 1 of 1: /usr/local/bin/postfailover.sh
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun(/usr/local/bin/postfailover.sh,[])
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-316974475
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun: /usr/local/bin/postfailover.sh: line 11: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 17: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 25: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR CommandRun: failed. exit status 127
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR (exit status 127) /usr/local/bin/postfailover.sh: line 11: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 17: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 25: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR Execution of PostMasterFailoverProcesses hook 1 of 1 failed in 11.565663ms with error: (exit status 127) /usr/local/bin/postfailover.sh: line 11: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 17: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 25: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Execution of PostMasterFailoverProcesses hook 1 of 1 failed in 11.565663ms with error: (exit status 127) /usr/local/bin/postfailover.sh: line 11: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 17: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: /usr/local/bin/postfailover.sh: line 25: mysql: command not found
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: done running PostMasterFailoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO Topology recovery: {"Id":5,"UID":"1656487404036690651:9053828941c63e5ae75f55ab7d8e20556e908b301d87ab5604e79ef6d57d5c2e","AnalysisEntry":{"AnalyzedInstanceKey":{"Hostname":"db1-node","Port":3306},"AnalyzedInstanceMasterKey":{"Hostname":"","Port":0},"ClusterDetails":{"ClusterName":"db1-node:3306","ClusterAlias":"enterprise","ClusterDomain":"starfleet","CountInstances":2,"HeuristicLag":0,"HasAutomatedMasterRecovery":true,"HasAutomatedIntermediateMasterRecovery":true},"AnalyzedInstanceDataCenter":"","AnalyzedInstanceRegion":"","AnalyzedInstancePhysicalEnvironment":"","AnalyzedInstanceBinlogCoordinates":{"LogFile":"mysql-bin.000002","LogPos":72008,"Type":0},"IsMaster":true,"IsReplicationGroupMember":false,"IsCoMaster":false,"LastCheckValid":true,"LastCheckPartialSuccess":true,"CountReplicas":1,"CountValidReplicas":1,"CountValidReplicatingReplicas":1,"CountReplicasFailingToConnectToMaster":0,"CountDowntimedReplicas":0,"ReplicationDepth":0,"Replicas":[{"Hostname":"db2-node","Port":3306}],"SlaveHosts":[{"Hostname":"db2-node","Port":3306}],"IsFailingToConnectToMaster":false,"Analysis":"DeadMaster","Description":"","StructureAnalysis":["NoWriteableMasterStructureWarning"],"IsDowntimed":false,"IsReplicasDowntimed":false,"DowntimeEndTimestamp":"","DowntimeRemainingSeconds":0,"IsBinlogServer":false,"PseudoGTIDImmediateTopology":true,"OracleGTIDImmediateTopology":false,"MariaDBGTIDImmediateTopology":false,"BinlogServerImmediateTopology":false,"SemiSyncMasterEnabled":false,"SemiSyncMasterStatus":false,"SemiSyncMasterWaitForReplicaCount":0,"SemiSyncMasterClients":0,"CountSemiSyncReplicasEnabled":0,"CountLoggingReplicas":1,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":0,"CountRowBasedLoggingReplicas":1,"CountDistinctMajorVersionsLoggingReplicas":1,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"ProcessingNodeHostname":"app-node","ProcessingNodeToken":"458fa1403f3dccda1d86f273b4b2bb0feb32a2ff434387e86a6452c327307492","CountAdditionalAgreeingNodes":0,"StartActivePeriod":"","SkippableDueToDowntime":false,"GTIDMode":"ON","MinReplicaGTIDMode":"ON","MaxReplicaGTIDMode":"ON","MaxReplicaGTIDErrant":"","CommandHint":"graceful-master-takeover","IsReadOnly":true},"SuccessorKey":{"Hostname":"db2-node","Port":3306},"SuccessorAlias":"db2-node","IsActive":false,"IsSuccessful":true,"LostReplicas":[],"ParticipatingInstanceKeys":[],"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","ProcessingNodeHostname":"","ProcessingNodeToken":"","Acknowledged":false,"AcknowledgedAt":"","AcknowledgedBy":"","AcknowledgedComment":"","LastDetectionId":0,"RelatedRecoveryId":0,"Type":"MasterRecovery","RecoveryType":"MasterRecoveryPseudoGTID"}
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running 1 PostFailoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running PostFailoverProcesses hook 1 of 1: echo '(for all types) Recovered from DeadMaster on db1-node:3306. Failed: db1-node:3306; Successor: db2-node:3306' >> /tmp/recovery.log
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun(echo '(for all types) Recovered from DeadMaster on db1-node:3306. Failed: db1-node:3306; Successor: db2-node:3306' >> /tmp/recovery.log,[])
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-823894382
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun:
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun successful. exit status 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Completed PostFailoverProcesses hook 1 of 1 in 3.290905ms
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: done running PostFailoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO instance db1-node:3306 read_only: true
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Waiting for 1 postponed functions
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO auditType:read-only instance:db1-node:3306 cluster:db1-node:3306 message:set as true
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: - RecoverDeadMaster: applying read-only=1 on demoted master: success=true
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG PostponedFunctionsContainer: waiting on 1 postponed functions
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG PostponedFunctionsContainer: done waiting
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Executed 1 postponed functions
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Executed postponed functions: regroup-replicas-pseudo-gtid db2-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG ChangeMasterTo: will attempt changing master on db1-node:3306 to db2-node:3306, mysql-bin.000016:234
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO ChangeMasterTo: Changed master on db1-node:3306 to: db2-node:3306, mysql-bin.000016:234. GTID: false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG ChangeMasterTo: will attempt changing master credentials on db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO ChangeMasterTo: Changed master credentials on db1-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running 1 PostGracefulTakeoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running PostGracefulTakeoverProcesses hook 1 of 1: echo 'Planned takeover complete' >> /tmp/recovery.log
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun(echo 'Planned takeover complete' >> /tmp/recovery.log,[])
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-371089909
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun:
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun successful. exit status 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Completed PostGracefulTakeoverProcesses hook 1 of 1 in 2.549104ms
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: done running PostGracefulTakeoverProcesses hooks
Jun 29 07:23:24 app-node orchestrator: [martini] Completed 200 OK in 25.486828157s
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 DEBUG analysis: ClusterName: db1-node:3306, IsMaster: false, LastCheckValid: false, LastCheckPartialSuccess: false, CountReplicas: 1, CountValidReplicas: 1, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: detected DeadCoMaster failure on db2-node:3306
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running 1 OnFailureDetectionProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Running OnFailureDetectionProcesses hook 1 of 1: echo 'Detected DeadCoMaster on db1-node:3306. Affected replicas: 1' >> /tmp/recovery.log
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun(echo 'Detected DeadCoMaster on db1-node:3306. Affected replicas: 1' >> /tmp/recovery.log,[])
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-341608144
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun:
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO CommandRun successful. exit status 0
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: Completed OnFailureDetectionProcesses hook 1 of 1 in 3.225661ms
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: done running OnFailureDetectionProcesses hooks
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:24 app-node orchestrator: 2022-06-29 07:23:24 ERROR 2022-06-29 07:23:24 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:25 app-node orchestrator: 2022-06-29 07:23:25 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:25 app-node orchestrator: 2022-06-29 07:23:25 INFO checkAndExecuteFailureDetectionProcesses: could not register DeadCoMaster detection on db2-node:3306
Jun 29 07:23:25 app-node orchestrator: 2022-06-29 07:23:25 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:25 app-node orchestrator: 2022-06-29 07:23:25 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:25 app-node orchestrator: 2022-06-29 07:23:25 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:25 app-node orchestrator: 2022-06-29 07:23:25 ERROR 2022-06-29 07:23:25 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:26 app-node orchestrator: 2022-06-29 07:23:26 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:26 app-node orchestrator: 2022-06-29 07:23:26 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:26 app-node orchestrator: 2022-06-29 07:23:26 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:26 app-node orchestrator: 2022-06-29 07:23:26 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:26 app-node orchestrator: 2022-06-29 07:23:26 ERROR 2022-06-29 07:23:26 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:27 app-node orchestrator: 2022-06-29 07:23:27 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:27 app-node orchestrator: 2022-06-29 07:23:27 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:27 app-node orchestrator: 2022-06-29 07:23:27 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:27 app-node orchestrator: 2022-06-29 07:23:27 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:27 app-node orchestrator: 2022-06-29 07:23:27 ERROR 2022-06-29 07:23:27 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:28 app-node orchestrator: 2022-06-29 07:23:28 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:28 app-node orchestrator: 2022-06-29 07:23:28 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:28 app-node orchestrator: 2022-06-29 07:23:28 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:28 app-node orchestrator: 2022-06-29 07:23:28 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:28 app-node orchestrator: 2022-06-29 07:23:28 ERROR 2022-06-29 07:23:28 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:29 app-node orchestrator: 2022-06-29 07:23:29 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:29 app-node orchestrator: 2022-06-29 07:23:29 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:29 app-node orchestrator: 2022-06-29 07:23:29 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:29 app-node orchestrator: 2022-06-29 07:23:29 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:29 app-node orchestrator: 2022-06-29 07:23:29 ERROR 2022-06-29 07:23:29 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:30 app-node orchestrator: 2022-06-29 07:23:30 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster detection on db2-node:3306; isActionable?: true; skipProcesses: false
Jun 29 07:23:30 app-node orchestrator: 2022-06-29 07:23:30 INFO executeCheckAndRecoverFunction: proceeding with DeadCoMaster recovery on db2-node:3306; isRecoverable?: true; skipProcesses: false
Jun 29 07:23:30 app-node orchestrator: 2022-06-29 07:23:30 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:30 app-node orchestrator: 2022-06-29 07:23:30 INFO topology_recovery: found an active or recent recovery on db2-node:3306. Will not issue another RecoverDeadCoMaster.
Jun 29 07:23:30 app-node orchestrator: 2022-06-29 07:23:30 ERROR 2022-06-29 07:23:30 ERROR AttemptRecoveryRegistration: instance db2-node:3306 has recently been promoted (by failover of db1-node:3306) and is in active period. It will not be failed over. You may acknowledge the failure on db1-node:3306 (-c ack-instance-recoveries) to remove this blockage
Jun 29 07:23:40 app-node orchestrator: [martini] Started HEAD / for [::1]:48364
Jun 29 07:23:40 app-node orchestrator: [martini] Completed 302 Found in 198.129µs
Jun 29 07:23:40 app-node orchestrator: [martini] Started HEAD / for [::1]:48366
Jun 29 07:23:40 app-node orchestrator: [martini] Completed 302 Found in 270.647µs
Jun 29 07:23:40 app-node orchestrator: [martini] Started GET /api/topology/db1-node:3306 for [::1]:48368
Jun 29 07:23:40 app-node orchestrator: 2022-06-29 07:23:40 DEBUG instanceKey: db1-node:3306
Jun 29 07:23:40 app-node orchestrator: 2022-06-29 07:23:40 DEBUG instanceKey: db2-node:3306
Jun 29 07:23:40 app-node orchestrator: [martini] Completed 200 OK in 1.580531ms
Jun 29 07:23:43 app-node orchestrator: [martini] Started HEAD / for [::1]:48370
Jun 29 07:23:43 app-node orchestrator: [martini] Completed 302 Found in 227.577µs
Jun 29 07:23:43 app-node orchestrator: [martini] Started HEAD / for [::1]:48372
Jun 29 07:23:43 app-node orchestrator: [martini] Completed 302 Found in 509.954µs
Jun 29 07:23:43 app-node orchestrator: [martini] Started GET /api/topology/db2-node:3306 for [::1]:48374
Jun 29 07:23:43 app-node orchestrator: 2022-06-29 07:23:43 DEBUG instanceKey: db1-node:3306
Jun 29 07:23:43 app-node orchestrator: 2022-06-29 07:23:43 DEBUG instanceKey: db2-node:3306
Jun 29 07:23:43 app-node orchestrator: [martini] Completed 200 OK in 3.295195ms
Jun 29 07:23:51 app-node orchestrator: [martini] Started HEAD / for [::1]:48376
Jun 29 07:23:51 app-node orchestrator: [martini] Completed 302 Found in 200.838µs
Jun 29 07:23:51 app-node orchestrator: [martini] Started HEAD / for [::1]:48378
Jun 29 07:23:51 app-node orchestrator: [martini] Completed 302 Found in 223.485µs
Jun 29 07:23:51 app-node orchestrator: [martini] Started GET /api/topology/db1-node:3306 for [::1]:48380
Jun 29 07:23:51 app-node orchestrator: 2022-06-29 07:23:51 DEBUG instanceKey: db1-node:3306
Jun 29 07:23:51 app-node orchestrator: 2022-06-29 07:23:51 DEBUG instanceKey: db2-node:3306
Jun 29 07:23:51 app-node orchestrator: [martini] Completed 200 OK in 1.8472ms
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 DEBUG kv.SubmitMastersToKvStores, clusterName: , force: false: numPairs: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 DEBUG kv.SubmitMastersToKvStores: submitKvPairs: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
Jun 29 07:24:11 app-node orchestrator: 2022-06-29 07:24:11 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:24:26 app-node orchestrator: 2022-06-29 07:24:26 INFO auditType:end-downtime instance:db1-node:3306 cluster:db1-node:3306 message:
Jun 29 07:24:27 app-node orchestrator: 2022-06-29 07:24:27 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: AllCoMasterReplicasNotReplicating; key: db2-node:3306
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 DEBUG kv.SubmitMastersToKvStores, clusterName: , force: false: numPairs: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 DEBUG kv.SubmitMastersToKvStores: submitKvPairs: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
Jun 29 07:25:11 app-node orchestrator: 2022-06-29 07:25:11 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
Jun 29 07:25:28 app-node orchestrator: 2022-06-29 07:25:28 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: AllCoMasterReplicasNotReplicating; key: db2-node:3306
Hi Team, Any update on this? so as to ensure that it won't occur again. Let us know if you require any other information.
Does no one read? Fork the project, it's not maintained anymore.