horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Multiple writer for the same sst caused by `close shard`

Open Rachelint opened this issue 1 year ago • 2 comments

Describe this problem

Shard will be moved from nodes when process panic because if for any reason, all operations related to such a shard should be stopped before moving(especially the write operations). However background works(flush, compaction, all of them are writes) will not be stoppend rightly before now. That caused a serious bug : multiple writers for one sst.

Server version

CeresDB Server Version: 1.2.2 Git commit: 2e206650 Git branch: main Opt level: 3 Rustc version: 1.69.0-nightly Target: aarch64-apple-darwin Build date: 2023-06-12T13:01:03.592984000Z

Steps to reproduce

Hard to reproduce, if must do this, steps following may can work:

  • Setup a ceresdb cluster with ceresmeta.
  • Trigger compaction/flush work for a specific table of shard in one node manually.
  • Move the shard to another node by ceresmeta manually before comapction/flush work finishing.
  • Trigger compaction/flush work for the table of shard manually in the new node.

Expected behavior

No response

Additional Information

No response

Rachelint avatar Jun 12 '23 13:06 Rachelint

After #998, the updates following the closing shard will be forbidden. However, some ssts may be still being written when close the shard, while these ssts may share the same ids with the new ssts created by the new node, leading to the multiple writers on the same sst.

Let's fix this problem in another PR. @baojinri

ShiKaiWi avatar Jun 16 '23 10:06 ShiKaiWi

After #998, the updates following the closing shard will be forbidden. However, some ssts may be still being written when close the shard, while these ssts may share the same ids with the new ssts created by the new node, leading to the multiple writers on the same sst.

Let's fix this problem in another PR. @baojinri

#1009 has fixed the problem. However, #998 actually didn't achieve the goal to prevent updates after table is closed. And #998 has been reverted, I guess I'll submit another change set to make all things work.

ShiKaiWi avatar Jul 07 '23 09:07 ShiKaiWi