kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

feat(cluster): add a new command SLOTSIZE

Open greatsharp opened this issue 11 months ago • 5 comments

Background

  1. Issue #2715 and #2723 reported duplicated data during cluster expansion. This PR introduce a way to find out the dirt slots which slot has been migrated but data remained on node, then clear the dirty data on node.
  2. Users may want to know how many keys in each slot.

Usage Use command "clusterx slotsize {SlotRange} scan" to scan keys of slot range no node. Use command "clusterx clearslot {SlotRange}" to clear keys of slot range which slots has been migrated on node. Use command "clusterx slotsize {SlotRange}" to print out the stats of each slot. Snipaste_2025-05-29_22-00-23

Test Report Env: Tencent SA5.2XLARGE16 CVM Instance(8C16G), 4 CLOUD_TSSD disks each 100GB using 4 stripes. 13.3million keys on node, each slot has an average of 2400 keys. Each slot costs 2-3ms when executing command "slotsize {SlotRange} scan", from min 1ms to max 4ms. No significant variation observed in CPU utilization when scanning slots.

greatsharp avatar May 10 '25 13:05 greatsharp

@greatsharp Except for the above comments, I believe we should NOT support the clear subcommand in SLOTSIZE, which makes no sense(also, this command is marked as read-only).

git-hulk avatar May 23 '25 10:05 git-hulk

@greatsharp Except for the above comments, I believe we should NOT support the clear subcommand in SLOTSIZE, which makes no sense(also, this command is marked as read-only).

removed the clear sub command, we may add the clear dirty logic in 'clusterx migrate' command, or add a new sub command in clusterx command.

greatsharp avatar May 25 '25 02:05 greatsharp

@greatsharp I'm generally good with the current implementation, with a few comments. Could you please add Go test cases for those commands? cc @PragmaTwice

git-hulk avatar Jun 16 '25 05:06 git-hulk

@greatsharp I'm generally good with the current implementation, with a few comments. Could you please add Go test cases for those commands? cc @PragmaTwice

sure, wait for a moment.

greatsharp avatar Jun 17 '25 00:06 greatsharp

LGTM, one more comment: GetSlotStats should also only allow to get/scan the slot that belongs to the node. And can clean the old stats if the slot has been moved. cc @PragmaTwice

git-hulk avatar Jun 21 '25 15:06 git-hulk

LGTM, one more comment: GetSlotStats should also only allow to get/scan the slot that belongs to the node. And can clean the old stats if the slot has been moved. cc @PragmaTwice

How do you know whether there is duplicate dirty data on node or not if you cannot scan the slots which was migrated out?

greatsharp avatar Jun 22 '25 01:06 greatsharp

LGTM, one more comment: GetSlotStats should also only allow to get/scan the slot that belongs to the node. And can clean the old stats if the slot has been moved. cc @PragmaTwice

How do you know whether there is duplicate dirty data on node or not if you cannot scan the slots which was migrated out?

In my opinion, keep each command clear and accurate

greatsharp avatar Jun 22 '25 01:06 greatsharp

LGTM, one more comment: GetSlotStats should also only allow to get/scan the slot that belongs to the node. And can clean the old stats if the slot has been moved. cc @PragmaTwice

How do you know whether there is duplicate dirty data on node or not if you cannot scan the slots which was migrated out?

In my opinion, keep each command clear and accurate

You can check this when calling GetSlotStats, it would be easy to know which slots no longer belong to this node.

git-hulk avatar Jun 22 '25 03:06 git-hulk

LGTM, one more comment: GetSlotStats should also only allow to get/scan the slot that belongs to the node. And can clean the old stats if the slot has been moved. cc @PragmaTwice

How do you know whether there is duplicate dirty data on node or not if you cannot scan the slots which was migrated out?

In my opinion, keep each command clear and accurate

You can check this when calling GetSlotStats, it would be easy to know which slots no longer belong to this node.

Do you mean we can put the scan and clear process all in one GetSlotStats method? Could you give more details or setsp of it?

greatsharp avatar Jun 23 '25 01:06 greatsharp