incubator-horaedb-meta
incubator-horaedb-meta copied to clipboard
Refactor shard version related logic
Description The current shard version verification implementation is not perfect enough and has the following problems:
- The shard versions of CeresMeta and CeresDB are independent of each other. When inconsistencies occur, they must be restored by restarting the CeresDB node.
- shard version synchronization is chaotic and prone to unexpected Version inconsistencies.
- The verification logic of shard version limits concurrent DDL. Only one DDL can succeed on a shard at the same time.
Proposal Redesign and implement shard version related logic.
Additional context Some current thoughts:
- How to synchronize meta version with ceresdb?
- Return the latest version in the response of creating and deleting tables (I prefer this solution)
- Synchronize the latest version through heartbeat
- meta pulls the latest version through the interface provided by ceresdb
- Who will persist the shard version information?
- Keep it as is, persisted by meta, and ceresdb synchronizes version from meta when opening shard (I prefer this solution)
- Version persistence is maintained by ceresdb. When opening shard, ceresdb synchronizes it to meta through response.
- How to handle version when operating shards concurrently?
- Leave it as is, only one operation will succeed and the others will fail.
- When making a batch batch, create a table, delete a table and make a batch, you must consider how to increment the version.
- Batch operation, version +1
- For each operation in the batch, version +1
- Are version inconsistencies allowed within a certain range?
- Not allowed, must be completely consistent (current method)
- Record the operations on the shard, and ignore the version when operating the shard that allows changes or there will be a certain range of inconsistencies in the operation.
- How to recover when versions are inconsistent?
- Manually restart the node (current method, not acceptable)
- Automatic error correction and recovery
- Meta regularly inspects all shard versions. For inconsistent versions, meta initiates repair operations to ceresdb.
- ceresdb is responsible for error correction. When receiving a request with an inconsistent version, ceresdb initiates a repair operation to ceresmeta.
- How to correct the error specifically and what needs to be done before synchronizing to a consistent version?
- Try to rebuild the table or delete the table so that the failed procedure can be executed successfully.
- Ignore it directly and force version synchronization.