work
work copied to clipboard
Pass at least one key to script executions
This allows Redis to route our execution to the correct node when running against a Redis Cluster.
I encountered this issue with an on premises deployment of our application, in which our customer is using Redis Cluster. They were seeing errors with cross-slot key access in a Lua script. This surprised me, since I knew that work
provides a mechanism for namespacing keys to avoid this very problem.
And then I read the following in the EVAL
docs:
All Redis commands must be analyzed before execution to determine which keys the command will operate on. In order for this to be true for EVAL, keys must be passed explicitly. This is useful in many ways, but especially to make sure Redis Cluster can forward your request to the appropriate cluster node.
Note this rule is not enforced in order to provide the user with opportunities to abuse the Redis single instance configuration, at the cost of writing scripts not compatible with Redis Cluster
When I noticed that work doesn't pass any keys to the script evaluation, I surmised that this allowed Redis to route the script to any node in the cluster.
This PR is the minimal update to fix this issue, passing a single namespaced key to the script evaluation.
A more thorough, albeit slightly more invasive, fix would be to perform all key composition outside the script. That would also have the nice advantage of not requiring the Lua & Go key creation to be kept in sync.
If you're open to that, I'll happily update my fork and this PR.
We run sentinel but getting this to work sounds good to me.
I will try to add a circle ci redis "cluster" environment for this here: https://github.com/taylorchu/work/blob/78a4fef1197337a450c6cdf03029517636f088cc/.circleci/config.yml#L35
the previous idea is to use this "hash" in either name space or queue name:
Hash tags are documented in the Redis Cluster specification, but the gist is that if there is a substring between {} brackets in a key, only what is inside the string is hashed, so for example this{foo}key and another{foo}key are guaranteed to be in the same hash slot, and can be used together in a command with multiple keys as arguments.
@nyergler check this out: https://github.com/taylorchu/work/pull/50
This current master already works by adding {} "hash" to either namespace / queue name.
- if you add {} namespace, all kvs created will be on the same node
- furthermore, adding {} to queue name will be able to shard queues in the same namespace.
We were previously specifying the namespace using {}
in order to get Redis Cluster to treat it as a hash tag. However, it seems that even then, unless you specify the key(s) as EVAL
arguments, you're basically just getting lucky if it works. My understanding is that Redis Cluster routes the script to a single node, choosing that node based on the KEYS
passed in. If no KEYS
are passed, it gets routed to any node, since it doesn't know what needs to be accessed until the script is actually executed.
For what it's worth, we deployed this initially just specifying the hash tag as the namespace (in {}
), and things seemed to work fine. It was only when utilization grew and Redis rebalanced the nodes that we started seeing this issue. Passing the key in as part of the call to EVAL
immediately fixed the problem.
I need more info to produce this locally.
- the redis version that you are running.
- how redis cluster master/slave is added in rebalancing (the steps/commands that you run). there might be an issue of moving slots. I think there is still a missing piece from how redis works from your description because I can change the hash {} to any value and the tests still pass under the same cluster. If what you described is true that it might send to any node, the tests will fail 67% of the time after I change hash to another value.
Hey @taylorchu, makes sense; I'll come up with a crisp set of repro steps and update here.
@jpalawaga @nyergler could you check if 016e03b985aa066ce1a64e192eb367d8c6a278bd fixes it?
backstory: it turns out go-redis 9.0.3 fixes a bug where it might take the first non key (normal args) as routing key, so this problem is not surfaced. after upgrading to 9.0.5, I finally can reproduce this problem and fix it.