redisc icon indicating copy to clipboard operation
redisc copied to clipboard

Proposal: relax the constraints on pipeline for the `RetryConn`

Open mna opened this issue 1 year ago • 0 comments

Description

When the package was initially created, as a measure of precaution due to potentially tricky and problematic semantics, the automatic following of redirections (that are available via the connection returned by the RetryConn API function) disallow sending pipelined commands (i.e. Send, Flush and Receive always return an error).

This proposal is to reconsider this restriction and see if it would make sense for a retry connection to support the full redis.Conn API.

This is technically an internal implementation detail (no changes in the public API), but it enables new features that would be valid in a minor release. That being said, it would likely add significant complexity to the RetryConn implementation, and as such would be best released as a distinct minor version than the other proposals, as it is more risky (so that package users can decide if they want those changes or not).

Use Case

The main use-case would be to execute a pipeline of commands based on groupings of keys split either by the existing SplitBySlot API function, or by the proposed SplitByNode function. Executing commands that are known to (or very likely to) live on the same node, it makes sense to send them as a pipeline for more efficient execution. If a MOVED error is encountered, either all commands are guaranteed to work on the new target (if SplitBySlot was used), or are very likely to work on the new target (if SplitByNode was used, and one of the slots has moved, it's likely that the primary has failed and a replica is taking over, in which case the replica would take all the slots that were served by the primary). So for the most likely cases, the pipeline would require at most one redirection to be followed.

Note that while a RetryConn can be used to run via conn.Do any arbitrary commands with no concern for where the keys live (e.g. it may follow redirections for every single command called with conn.Do if someone is not careful in the keys design), doing so is both not efficient and not good application design and is not something that redisc wants to encourage or support explicitly (it does support it implicitly as just mentioned, but that is just a side-effect of the main feature of following redirections in case of a cluster failover). As such, for the same reasons, pipeline support in RetryConn would not attempt to optimize execution of the redirections it encounters (e.g. by running them all in parallel using goroutines or something like that). It assumes that the full pipeline is either guaranteed or very likely to be redirected to the same node if a redirection happens, and would retry the rest of the pipeline accordingly.

Implementation Concerns

There are a number of concerns to test and investigate before going forward with this proposal, some that are highly likely not an issue, and others that may be more tricky:

  • sending the following, if it fails after the 2nd command, should not repeat the start of the pipeline: GET a, INCR b, GET c.
  • calling conn.Do("") should return the last received value (or the first error encountered).
  • conn.Receive should work properly, but that means maintaining the connections after following redirections (or most likely storing the results to be returned)
  • how would conn.Flush work? Likely it would flush to the initial conn. Can more Send calls be made after flush? To test with redigo.
  • If used improperly (eg. with no concern for keys locations), it could very well be less efficient than not using a pipeline (e.g. send the full pipeline to node 1, first reply is a MOVED, send the full pipeline to node 2, second reply is a MOVED, etc...)
  • Does executing a pipeline stop at the first error, or does it keep running all subsequent commands? If it keeps running, it means the pipelined commands could execute in a different order (e.g. run cmd 1, cmd 2 returns a MOVED error, cmd 3 runs, then it retries cmd 2 on the different node). Could this break the expected results, e.g. if cmd 3 depends on cmd2 (I'm not sure this can really happen given that they'd be distinct keys)?

(note that this proposal is not a promise of implementation, nor is it a promise of merging a PR that would implement this feature - it is first and foremost for me to give it some thought and as a reminder when I have some time to work on the package, and published for visibility)

mna avatar Oct 29 '22 19:10 mna