incubator-horaedb-meta icon indicating copy to clipboard operation
incubator-horaedb-meta copied to clipboard

Refactor shard version related logic

Open ZuLiangWang opened this issue 1 year ago • 0 comments

Description The current shard version verification implementation is not perfect enough and has the following problems:

  • The shard versions of CeresMeta and CeresDB are independent of each other. When inconsistencies occur, they must be restored by restarting the CeresDB node.
  • shard version synchronization is chaotic and prone to unexpected Version inconsistencies.
  • The verification logic of shard version limits concurrent DDL. Only one DDL can succeed on a shard at the same time.

Proposal Redesign and implement shard version related logic.

Additional context Some current thoughts:

  1. How to synchronize meta version with ceresdb?
    1. Return the latest version in the response of creating and deleting tables (I prefer this solution)
    2. Synchronize the latest version through heartbeat
    3. meta pulls the latest version through the interface provided by ceresdb
  2. Who will persist the shard version information?
    1. Keep it as is, persisted by meta, and ceresdb synchronizes version from meta when opening shard (I prefer this solution)
    2. Version persistence is maintained by ceresdb. When opening shard, ceresdb synchronizes it to meta through response.
  3. How to handle version when operating shards concurrently?
    1. Leave it as is, only one operation will succeed and the others will fail.
    2. When making a batch batch, create a table, delete a table and make a batch, you must consider how to increment the version.
      1. Batch operation, version +1
      2. For each operation in the batch, version +1
  4. Are version inconsistencies allowed within a certain range?
    1. Not allowed, must be completely consistent (current method)
    2. Record the operations on the shard, and ignore the version when operating the shard that allows changes or there will be a certain range of inconsistencies in the operation.
  5. How to recover when versions are inconsistent?
    1. Manually restart the node (current method, not acceptable)
    2. Automatic error correction and recovery
      1. Meta regularly inspects all shard versions. For inconsistent versions, meta initiates repair operations to ceresdb.
      2. ceresdb is responsible for error correction. When receiving a request with an inconsistent version, ceresdb initiates a repair operation to ceresmeta.
      3. How to correct the error specifically and what needs to be done before synchronizing to a consistent version?
        1. Try to rebuild the table or delete the table so that the failed procedure can be executed successfully.
        2. Ignore it directly and force version synchronization.

ZuLiangWang avatar Oct 26 '23 05:10 ZuLiangWang