redisc icon indicating copy to clipboard operation
redisc copied to clipboard

Proposal: add `Cluster` method to wait for stable Redis cluster

Open mna opened this issue 1 year ago • 1 comments

Description

Add a Cluster.WaitForCluster(ctx context.Context) error method that calls the CLUSTER INFO Redis commands at intervals until it returns cluster_state:ok or the context expires. The call blocks until the cluster is stable or the context expires, and on expiration it returns the context's error (ctx.Err()). On success (once a call to CLUSTER INFO returns cluster_state:ok), it returns nil.

It is a new API and as such could be released as part of a minor version.

Use Case

It is recommended to call Cluster.Refresh() at the start of an application, so that the first Redis connections already benefit from smart routing. However, if the redis cluster is still being setup and created at the same time the application starts, it is possible that the CLUSTER SLOTS Redis command (called by Cluster.Refresh()) only returns partial information, if the cluster is not yet created or stable, resulting in potentially slower calls for the first few connections, or even failures if by the time the first connections are made, the cluster is still not stable.

A typical use for this new API would be to call Cluster.WaitForCluster(ctx) at the start of the application, before calling Cluster.Refresh(), so that the full slots-to-node mapping is known before the first use. The timeout is defined by the caller (typically via context.WithDeadline() or context.WithTimeout()) as there's no single good value that the package could use for this.

Of course the method can also be called in other contexts, but the benefit is less obvious. However, it would still work as expected in the sense that it would always block until the call expires or Redis replies with cluster_state:ok.

Implementation Concerns

How should the polling interval work? I don't want to complicate the API and make that configurable, as in the main use-case (at the start of the application), the interval should not be a critical thing (there shouldn't be much load on the redis servers, there's no "thundering herd" concern, etc). My guess would be to use something similar to Go in net/http/server.go (Server.Serve method), where it retries the calls to Accept by starting small (5ms, though in the case of redisc, something like 100ms might be a better initial retry interval) and doubling until it reaches 1s.


(note that this proposal is not a promise of implementation, nor is it a promise of merging a PR that would implement this feature - it is first and foremost for me to give it some thought and as a reminder when I have some time to work on the package, and published for visibility)

mna avatar Oct 29 '22 17:10 mna