incubator-horaedb-meta icon indicating copy to clipboard operation
incubator-horaedb-meta copied to clipboard

Basic failover capability of CeresDB cluster

Open ZuLiangWang opened this issue 2 years ago • 0 comments

Description We implemented the cluster management capability of CeresDB with Procedure, but Procedure only provides shard's scheduling functionality, and it does not actively check CeresDB's cluster state. The ability to failover is still lacking.

Proposal Implement the simplest failover of CeresDB cluster mode. After CeresDB node crash, the faulty node is automatically removed and the routing relationship is adjusted. This should includes these functions:

  • Check whether the node crashed based on heartbeat.
  • When node is confirmed to be crashed, remove it from the metadata and transfer leader by invoke TransferLeaderProcedure.

Additional context

ZuLiangWang avatar Oct 17 '22 11:10 ZuLiangWang