valkey icon indicating copy to clipboard operation
valkey copied to clipboard

add cluster flushslot command.

Open wuranxx opened this issue 1 year ago • 18 comments

Add cluster flushslot command. Closes #1133.

background

We want to implement the ability to SYNC or ASYNC free all data from a single slot. This is useful in two cases:

  1. When a primary loses ownership of a slot, it will temporarily freeze while it synchronously frees the slot.
  2. When doing resharding, you may prefer to forcibly delete all the data from a slot and assign it to another node.

behavior

  1. Add CLUSTER FLUSHSLOT <slot> [ASYNC|SYNC] command.

  2. Modify the function signature of delKeysInSlot, adding a lazy parameter to decide whether to delete keys SYNC or ASYNC.

wuranxx avatar Dec 03 '24 13:12 wuranxx

Codecov Report

Attention: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 71.11%. Comparing base (374718b) to head (9a2839e). Report is 18 commits behind head on unstable.

Files with missing lines Patch % Lines
src/cluster.c 81.81% 2 Missing :warning:
src/cluster_legacy.c 92.85% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1384      +/-   ##
============================================
- Coverage     71.21%   71.11%   -0.11%     
============================================
  Files           122      122              
  Lines         66049    66067      +18     
============================================
- Hits          47038    46984      -54     
- Misses        19011    19083      +72     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/cluster_legacy.c 86.51% <92.85%> (+0.17%) :arrow_up:
src/cluster.c 90.12% <81.81%> (-0.13%) :arrow_down:

... and 12 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Dec 04 '24 22:12 codecov[bot]

Please help review this PR. Thank you. @hwware @madolson @PingXie @zuiderkwast

wuranxx avatar Dec 05 '24 02:12 wuranxx

There are too much information in issue https://github.com/valkey-io/valkey/issues/1133. To save some of the reviewer energy, could you pls update the top comment to describe the background why we want to this new command? You could reference PR https://github.com/valkey-io/valkey/pull/1392.

Thanks a lot

hwware avatar Dec 05 '24 19:12 hwware

Before I rebased to the latest unstable (old commit is 32f7541fe34e5e0520a5917d09661756d330bd11) , the Tcl tests and manual command tests ran fine locally.

After I rebased to the latest unstable 6df376d68a97e9c0da4549f57db96742b5482202, the command failed when executed locally.

It is possible that changes in the code during this period caused the inconsistent behavior. Until I confirm the cause, I have marked this PR as WIP (Work In Progress).

wuranxx avatar Dec 06 '24 09:12 wuranxx

Before I rebased to the latest unstable (old commit is 32f7541) , the Tcl tests and manual command tests ran fine locally.

After I rebased to the latest unstable 6df376d, the command failed when executed locally.

It is possible that changes in the code during this period caused the inconsistent behavior. Until I confirm the cause, I have marked this PR as WIP (Work In Progress).

Fixed. For details, see #1399.

wuranxx avatar Jan 22 '25 03:01 wuranxx

@enjoy-binbin @murphyjacob4 Do you want this command for atomic slot migration, for example when a slot migration is cancelled, the importing node can send CLUSTER FLUSHSLOT to its replicas to clean up the slot?

zuiderkwast avatar Jan 22 '25 11:01 zuiderkwast

I think we will have a protocol for cancelling the slot migration, and this will trigger the drop for all slot ranges being migrated locally. But we should be able to use the async delKeysInSlot functionality to implement this

murphyjacob4 avatar Jan 22 '25 20:01 murphyjacob4

I think we will have a protocol for cancelling the slot migration, and this will trigger the drop for all slot ranges being migrated locally. But we should be able to use the async delKeysInSlot functionality to implement this

@murphyjacob4 The replicas are not involved in the slot migration protocol IIUC. So when a primary cleans up a cancelled slot import, do you just replicate a lot of DEL commands to the replicas? Replicating a single FLUSHSLOT may save some work and bandwidth...

zuiderkwast avatar Jan 23 '25 07:01 zuiderkwast

do you just replicate a lot of DEL commands to the replicas? Replicating a single FLUSHSLOT may save some work and bandwidth...

Yeah that's a good idea. Currently, this is what delKeysInSlot does, and it could result in a lot of bandwidth for big slots. Disabling the replication of the individual keys in delKeysInSlot and replacing with a single FLUSHSLOT makes sense to me. This path is also executed in the dirty slot case, so it won't just be an improvement for atomic slot migration (although I anticipate it will be executed more frequently there).

murphyjacob4 avatar Jan 23 '25 18:01 murphyjacob4

I add command scene to delKeysInSlot. In command sence, delKeysInSlot will not propagate del command, but will notify del event.

flushslot now just propagate flushslot command to aof and replica.

please help review this PR, thank you! @zuiderkwast @PingXie

wuranxx avatar Apr 12 '25 09:04 wuranxx

I noticed there is unfortunately already a command with a very similar name: CLUSTER FLUSHSLOTS:

The CLUSTER FLUSHSLOTS deletes all information about slots from the connected node. It can only be called when the database is empty.

The new command in this PR CLUSTER FLUSHSLOT is maybe too similar to that command?

Maybe we should actually make it a top-level command FLUSHSLOT without CLUSTER? It will match FLUSHDB and FLUSHALL. We need to discuss it again in the core team meeting. No need to change the PR before we have discussed it again.

zuiderkwast avatar May 11 '25 10:05 zuiderkwast

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

madolson avatar May 12 '25 15:05 madolson

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

I think we discussed two decisions

  1. whether this should be a top level command?

based on the use case described in the issue, this is clearly not a "data access" command meant for the application developers, hence the guidance of keeping it under cluster.

  1. whether it is a good idea to introduce a new command whose name differs from that of an existing command by just one character? cluster flushshot vs cluster flushshots.

I don't remember if we reached a conclusion?

PingXie avatar May 13 '25 03:05 PingXie

I don't remember if we reached a conclusion?

Thanks Ping for summarizing the discussion. We did reach the conclusion to stick to cluster flushslot, as Madelyn wrote.

Neither of these commands are frequently used so the similarity between the two commands (cluster flushshot vs cluster flushshots) is not a major concern.

zuiderkwast avatar May 13 '25 08:05 zuiderkwast

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

I think we discussed two decisions

  1. whether this should be a top level command?

based on the use case described in the issue, this is clearly not a "data access" command meant for the application developers, hence the guidance of keeping it under cluster.

  1. whether it is a good idea to introduce a new command whose name differs from that of an existing command by just one character? cluster flushshot vs cluster flushshots.

I don't remember if we reached a conclusion?

I forgot if you drop off meeting earlier, we have one conclusion that the new command "cluster flushslot" as an internal command.

hwware avatar May 13 '25 13:05 hwware

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

I think we discussed two decisions

  1. whether this should be a top level command?

based on the use case described in the issue, this is clearly not a "data access" command meant for the application developers, hence the guidance of keeping it under cluster.

  1. whether it is a good idea to introduce a new command whose name differs from that of an existing command by just one character? cluster flushshot vs cluster flushshots.

I don't remember if we reached a conclusion?

I forgot if you drop off meeting earlier, we have one conclusion that the new command "cluster flushslot" as an internal command, like cluster-SYNCSLOTS (a new internal command too) image

hwware avatar May 13 '25 15:05 hwware

@murphyjacob4 Flush one slot at a time, is that enough for cancelled atomic slot migration? If you have 1000 ongoing slot migrations that are cancelled, will you need to send 1000 flushslot commands to the replicas at the same time?

I think it is fine. ASM actually has a few scenarios for FLUSHSLOT:

  1. If a slot migration is cancelled, each slot within this migration will be removed by the target primary and FLUSHSLOT will be propagated to the target replica(s)
  2. If FLUSHSLOT is run on the source primary, it will be sent to the target primary (via slot migration link) foreach ongoing slot export on that node (if any). Target primary will forward this to the target replicas.
  3. If FLUSHSLOT is run on the target primary (by user), it will be denied if the slot is being imported (should be covered by this CL I think, will return -MOVED)
  4. If FLUSHDB is run on the source primary, if there is a slot export occurring, it will replicate as a FLUSHSLOT command for each owned slot. For all target primaries and their replicas, the FLUSHSLOT commands that match the ongoing slot migration will be propagated via the path in 2
  5. If FLUSHDB is run on the target primary, if there is a slot import occurring, it will replicate as a FLUSHSLOT command for each owned slot. Locally, we will leave the importing data. We will not replicate a FLUSHSLOT for the ongoing slot import (since we want to keep that data

In the worst case, it would be something like ~700KiB of replication data to send 16384 FLUSHSLOT commands (each would be <= 43 bytes, *3\r\n$7\r\nCLUSTER\r\n$9\r\nFLUSHSLOT\r\n$5\r\n16383). But this seems like a very rare case. We have a path to optimizing with something like CLUSTER FLUSHSLOTSRANGE a la CLUSTER ADDSLOTSRANGE, but I don't think it is required, and just FLUSHSLOT should better than propagating DEL for each key (at least for 99% of use cases)

murphyjacob4 avatar May 20 '25 18:05 murphyjacob4

  • If FLUSHDB is run on the source primary, if there is a slot export occurring, it will replicate as a FLUSHSLOT command for each owned slot. For all target primaries and their replicas, the FLUSHSLOT commands that match the ongoing slot migration will be propagated via the path in 2
  • If FLUSHDB is run on the target primary, if there is a slot import occurring, it will replicate as a FLUSHSLOT command for each owned slot. Locally, we will leave the importing data. We will not replicate a FLUSHSLOT for the ongoing slot import (since we want to keep that data

@murphyjacob4 Now that we have multi-db in cluster mode, FLUSHSLOT will affects all databases. FLUSHDB should only flush the currently SELECT'ed database.

zuiderkwast avatar May 21 '25 08:05 zuiderkwast

@murphyjacob4 Flush one slot at a time, is that enough for cancelled atomic slot migration? If you have 1000 ongoing slot migrations that are cancelled, will you need to send 1000 flushslot commands to the replicas at the same time?

I think it is fine. ASM actually has a few scenarios for FLUSHSLOT:

  1. If a slot migration is cancelled, each slot within this migration will be removed by the target primary and FLUSHSLOT will be propagated to the target replica(s)
  2. If FLUSHSLOT is run on the source primary, it will be sent to the target primary (via slot migration link) foreach ongoing slot export on that node (if any). Target primary will forward this to the target replicas.
  3. If FLUSHSLOT is run on the target primary (by user), it will be denied if the slot is being imported (should be covered by this CL I think, will return -MOVED)
  4. If FLUSHDB is run on the source primary, if there is a slot export occurring, it will replicate as a FLUSHSLOT command for each owned slot. For all target primaries and their replicas, the FLUSHSLOT commands that match the ongoing slot migration will be propagated via the path in 2
  5. If FLUSHDB is run on the target primary, if there is a slot import occurring, it will replicate as a FLUSHSLOT command for each owned slot. Locally, we will leave the importing data. We will not replicate a FLUSHSLOT for the ongoing slot import (since we want to keep that data

In the worst case, it would be something like ~700KiB of replication data to send 16384 FLUSHSLOT commands (each would be <= 43 bytes, *3\r\n$7\r\nCLUSTER\r\n$9\r\nFLUSHSLOT\r\n$5\r\n16383). But this seems like a very rare case. We have a path to optimizing with something like CLUSTER FLUSHSLOTSRANGE a la CLUSTER ADDSLOTSRANGE, but I don't think it is required, and just FLUSHSLOT should better than propagating DEL for each key (at least for 99% of use cases)

We have consensus that flushslot is an internal command only, I think case 2 and 3 will not happen. How do you think?

hwware avatar May 23 '25 18:05 hwware

We have consensus that flushslot is an internal command only, I think case 2 and 3 will not happen. How do you think?

Agree, but I'm not sure about case 4 and 5. Maybe we need CLUSTER FLUSHSLOT 2333 DB 2 to handle FLUSHDB on one db during atomic slot migration. It can be extended later if we need it. Otherwise, we could change flushdb so it only affects owned slots?

zuiderkwast avatar May 23 '25 18:05 zuiderkwast

We have consensus that flushslot is an internal command only, I think case 2 and 3 will not happen. How do you think?

Agree, but I'm not sure about case 4 and 5. Maybe we need CLUSTER FLUSHSLOT 2333 DB 2 to handle FLUSHDB on one db during atomic slot migration. It can be extended later if we need it. Otherwise, we could change flushdb so it only affects owned slots?

I think case 4 is a valid scenario. However, for case 5, it depends on how the operator does. One worst case is causing data inconsistent. (source primary is exporting data, and FLUSHDB command runs target primary, then some data will lost)

hwware avatar May 26 '25 16:05 hwware

For the rollback of atomic slot migration, using FLUSHSLOT to affect all databases is more appropriate. However, considering the replication of FLUSHDB during migration and the possibility that users might need to clear a specific slot in a particular database, I think CLUSTER FLUSHSLOT <slotid> [DB id] would be a good option.

soloestoy avatar May 28 '25 01:05 soloestoy

For the rollback of atomic slot migration, using FLUSHSLOT to affect all databases is more appropriate. However, considering the replication of FLUSHDB during migration and the possibility that users might need to clear a specific slot in a particular database, I think CLUSTER FLUSHSLOT <slotid> [DB id] would be a good option.

I think the proper command format as below:

CLUSTER FLUSHSLOT [[DB id] | all] [ASYNC|SYNC]

if there is no any argument after , this command will only apply current db, it is aligned with FLUSHDB if the command as CLUSTER FLUSHSLOT [DB id], it will only remove specific db keys if command as CLUSTER FLUSHSLOT all, it will remove keys from all dbs

I add this PR in our next Monday meeting agenda.

hwware avatar May 28 '25 15:05 hwware

@valkey-io/core-team Shouldn't this have had a major decision tag? I remember we decided this was an internal command, but it doesn't look like it was really restricted to just be internal.

madolson avatar Jun 04 '25 20:06 madolson

The issue has the major decision tag. Seemed enough...(?)

it doesn't look like it was really restricted to just be internal.

If we don't add a markdown file for it in the docs, it will not show up in the docs, but maybe it will show up somewhere else? How can we make it completely internal?

zuiderkwast avatar Jun 05 '25 00:06 zuiderkwast