[NEW] Send cluster topology changes as push messages.
The problem/use-case that the feature addresses
Today clients find out that a topology change happened only after the fact - either by periodically querying CLUSTER NODES/SLOTS/SHARDS or by receiving MOVED/ASK errors. When a client finds out that a topology change happened by receiving an error, the client needs to call CLUSTER NODES/SLOTS/SHARDS in order to get the new cluster topology, which might be slow on large, fragmented clusters.
Description of the feature
Using RESP3 push messages, nodes might send clients updates on topology changes or slot migrations, with all the relevant information. This means that
- clients are updated during the change, not after the fact
- clients could receive only the relevant info (slot X was moved from A to B), instead of having to query the whole topology. this is both more economical in regards to network traffic, and doesn't block the server for slow calls.
Alternatives you've considered
I believe this can be easily implemented for slot migration, but topology changes will probably be harder - which node should inform the client about the new nodes? an alternative might be to just inform the client on epoch changes, and let the client query the current topology in the usual way.
@nihohit How about having a reserved channel for pubsub notification for cluster topology changes generated on each node? This would be similar to keyspace notifications.
A client needs to be connected to all of the nodes to receive the message(s) which I presume is fine.
I implemented this for single moved slot some years ago. A single moved slot is what you get when you scale, i.e. migrate slots between nodes, because migration is done one slot at a time. It is practical to send a notification with just this change in this case. It's basically the same information as a MOVED redirect. When moving N slots, clients need to update the slot mapping N times (if they do it on MOVED), multiplying the work clients have to do to keep updated. That's why I think it makes the most sense for this scenario. I can revive the PR if there is interest.
For other changes, such as many slots moved (failover) or new replica added or deleted, notifications can't include all the relevant information, so I excluded it. A very small notification would be useful, like "topology changed" or "replicas changed".
How about having a reserved channel for pubsub notification for cluster topology changes generated on each node? This would be similar to keyspace notifications.
That sounds reasonable, although I'm not sure what's the difference between a pubsub notification and just a push notification.
For other changes, such as many slots moved (failover) or new replica added or deleted, notifications can't include all the relevant information, so I excluded it. A very small notification would be useful, like "topology changed" or "replicas changed".
I'm not sure what's "all of the relevant information", but as a client maintainer, I'd appreciate if the message contained as much of the relevant information.
A pubsub channel is used for other things like client-side caching and keyspace notifications. It's possible to use RESP2. There is already the syntax to subscribe, so it can be done without adding a new command.
A pubsub message has a fixed layout as (pseudo-JSON) ["message", "channel", "payload"] or if psubscribe is used ["pmessage", "pattern", "channel", "payload"], so the payload is just a string. It can be regarded as a limitation, but that's anyway why I formatted it just like a redirect on the form "MOVED slot host:port", so maybe clients can reuse code for redirect parsing...
A pubsub channel is used for other things like client-side caching and keyspace notifications
Ah, that explains it. I assume that it's the same mechanism as pubsub channels on the server side. On our client side these are handled differently from "proper" pubsub messages :)