dragonfly
dragonfly copied to clipboard
Atomic Cluster Config Set
Today, there are various races, that could lead to writes losses, when setting cluster config.
The Problem
Upon setting a new slot ownership config, master nodes may “lose” ownership over certain slots. In such a case, all requests to use (both read and write) keys which belong to unowned slots are supposed to receive MOVED
replies.
A naive (read: current) implementation will simply set the configuration, and any future requests will access the configuration to see the updated slot mapping.
However, there could be in-progress requests being handled, which already saw the previous configuration (i.e. they moved passed replying with MOVED
). If we reply with OK to a write/modify command, we’ll lose that information as it will be unretrievable. Furthermore, it will also be undeletable which can be seen as a memory leak.
Proposed Solution
- We’ll set the config immediately upon receiving it
- Add 2 per-slot counters: how many
requests_started
, and how manyrequests_finished
- Before handling any key, we’ll check the cluster config:
a. If the key does not belong to this node, we’ll reply with
MOVED
b. Otherwise, we’ll incrementrequests_started
c. When the request finishes, we’ll incrementrequests_finished
- When receiving a set-cluster-config request, we’ll save each thread’s
requests_started
- Then we’ll wait for all threads’
requests_finished
to be at least as big as their savedrequests_started
At this point in time all requests that saw the previous config will have already finished running
Future Expansion
Our planned slot-migration feature should hook into this solution to determine when the last possible writes to slots are done, end stable sync, and complete taking over the ownership of the slot.