redisc
redisc copied to clipboard
Proposal: add `Cluster` method to split keys by nodes
Description
Add a Cluster.SplitByNode(keys ...string) [][]string
that takes a list of Redis keys and returns them grouped by keys that belong to the same node. It is similar to SplitBySlot
, but while SplitBySlot
is guaranteed to return groupings of keys that definitely belong together (as they live in the same hash slot), SplitByNode
takes a more optimistic approach and returns groupings of keys that are likely to live on the same node, even if they don't necessarily share the same hash slot.
It does so by using the current known slot-to-node mapping of the Cluster
(which is why the new function must be a method on the Cluster
type, it needs access to its mapping). If the cluster has remained stable since the last time the mapping was updated, and remains stable until the commands for those keys are executed, then it can significantly improve performance as more keys can be grouped together using the same connection. If the cluster has not remained stable, then it will result in MOVED
errors and it can be handled as per the usual mechanisms of the redisc
package, either manually or automatically via a RetryConn
.
It is a new API and as such could be released as part of a minor version.
Use Case
While in general care should be taken to design keys so that related data lives in the same key slot (e.g. {user:123}:foo
and {user:123}:bar
, so that both foo
and bar
data for user 123
live in the same slot), it is sometimes useful, to run a number of commands that cannot fit this restriction (e.g. an admin user may want to retrieve all users' foo
s). In this case, the execution of commands can be optimized with SplitByNode
.
Implementation Concerns
What if at the moment this is called, not all slots are mapped to a node? It could return an error, but that would make it troublesome to use that API - the calling code would always have to prepare for the error case and have a fallback scenario (which would likely be to use SplitBySlot
). Another option would be to return a random node for the slots that are not covered by a node, but that's suboptimal as it is almost sure that those keys will result in a MOVED
error.
A fair option, I think, would be for SplitByNode
to return the same groupings as SplitBySlot
for the keys that are not covered by any node. So it always returns valid groupings of keys, the calling code is the same whether the internal mappings are complete or partial, and the groupings are as optimal as they can be - the keys with a valid mapping are properly grouped, and those without a mapping are grouped by their slot so that when executed, they will return at most a single MOVED
error (for the first command) and after following that redirection all keys in that grouping will be on the right node.
If the Cluster
is closed, instead of returning an error, simply call SplitBySlot
instead. The "closed cluster" error will be returned when attempting to run any command on that cluster.
(note that this proposal is not a promise of implementation, nor is it a promise of merging a PR that would implement this feature - it is first and foremost for me to give it some thought and as a reminder when I have some time to work on the package, and published for visibility)