valkey icon indicating copy to clipboard operation
valkey copied to clipboard

[NEW] Send cluster topology changes as push messages.

Open nihohit opened this issue 1 year ago • 6 comments

The problem/use-case that the feature addresses

Today clients find out that a topology change happened only after the fact - either by periodically querying CLUSTER NODES/SLOTS/SHARDS or by receiving MOVED/ASK errors. When a client finds out that a topology change happened by receiving an error, the client needs to call CLUSTER NODES/SLOTS/SHARDS in order to get the new cluster topology, which might be slow on large, fragmented clusters.

Description of the feature

Using RESP3 push messages, nodes might send clients updates on topology changes or slot migrations, with all the relevant information. This means that

  1. clients are updated during the change, not after the fact
  2. clients could receive only the relevant info (slot X was moved from A to B), instead of having to query the whole topology. this is both more economical in regards to network traffic, and doesn't block the server for slow calls.

Alternatives you've considered

I believe this can be easily implemented for slot migration, but topology changes will probably be harder - which node should inform the client about the new nodes? an alternative might be to just inform the client on epoch changes, and let the client query the current topology in the usual way.

nihohit avatar Mar 28 '24 05:03 nihohit

@nihohit How about having a reserved channel for pubsub notification for cluster topology changes generated on each node? This would be similar to keyspace notifications.

A client needs to be connected to all of the nodes to receive the message(s) which I presume is fine.

hpatro avatar Mar 28 '24 20:03 hpatro

I implemented this for single moved slot some years ago. A single moved slot is what you get when you scale, i.e. migrate slots between nodes, because migration is done one slot at a time. It is practical to send a notification with just this change in this case. It's basically the same information as a MOVED redirect. When moving N slots, clients need to update the slot mapping N times (if they do it on MOVED), multiplying the work clients have to do to keep updated. That's why I think it makes the most sense for this scenario. I can revive the PR if there is interest.

For other changes, such as many slots moved (failover) or new replica added or deleted, notifications can't include all the relevant information, so I excluded it. A very small notification would be useful, like "topology changed" or "replicas changed".

zuiderkwast avatar Apr 10 '24 11:04 zuiderkwast

How about having a reserved channel for pubsub notification for cluster topology changes generated on each node? This would be similar to keyspace notifications.

That sounds reasonable, although I'm not sure what's the difference between a pubsub notification and just a push notification.

For other changes, such as many slots moved (failover) or new replica added or deleted, notifications can't include all the relevant information, so I excluded it. A very small notification would be useful, like "topology changed" or "replicas changed".

I'm not sure what's "all of the relevant information", but as a client maintainer, I'd appreciate if the message contained as much of the relevant information.

nihohit avatar Apr 10 '24 12:04 nihohit

A pubsub channel is used for other things like client-side caching and keyspace notifications. It's possible to use RESP2. There is already the syntax to subscribe, so it can be done without adding a new command.

zuiderkwast avatar Apr 10 '24 12:04 zuiderkwast

A pubsub message has a fixed layout as (pseudo-JSON) ["message", "channel", "payload"] or if psubscribe is used ["pmessage", "pattern", "channel", "payload"], so the payload is just a string. It can be regarded as a limitation, but that's anyway why I formatted it just like a redirect on the form "MOVED slot host:port", so maybe clients can reuse code for redirect parsing...

zuiderkwast avatar Apr 10 '24 13:04 zuiderkwast

A pubsub channel is used for other things like client-side caching and keyspace notifications

Ah, that explains it. I assume that it's the same mechanism as pubsub channels on the server side. On our client side these are handled differently from "proper" pubsub messages :)

nihohit avatar Apr 10 '24 13:04 nihohit