StackExchange.Redis
StackExchange.Redis copied to clipboard
Cluster: support host names when available, via CLUSTER SLOTS
Situation:
- especially from v7, cluster shards may be routed via hosts instead of IPs
- thus, the advertised IPs may sometimes (with known examples) may not be routable
- SLOTS and SHARDS (v7+) provide the hosts; NODES does not, as far as known
Proposal:
- prefer SLOTS to NODES (but not SHARDS due to v7 dependency)
- route via hostname when available, IP otherwise
- possibly with option to use fallback old behaviour?
Was: "Lookup by endpoint should check IP and host"
Currently, only exact endpoint matches are considered. However, specially in the case of CLUSTER, a node may have both host and IP evidently. Further, -MOVED may return the unexpected option, leading to additional connections.
We should:
- record both identities as declared, for example from the
CLUSTER NODESresponse - if no equality endpoint match is found, check also using the best data available (after type-testing the endpoint)
Context: https://github.com/dotnet/aspnetcore/issues/59211
Auxiliary consideration: should cluster routing be limited to IP, as current? Should there be a "use host routing" option? Or should we try IP and use host if that fails?
Additional complexity for bonus points: (part of) the cluster might be hidden behind a proxy / load balancer requiring SNI, i.e. all (some) hostnames resolve to the same IP.
Side note: some of this may already be shimmable by individual callers, if they already know how they want things to connect: the Tunnel API allows callers to override how connections are established. This would, in theory, allow custom client-side code to take a requested endpoint and do whatever it needs, as long as it can create a Stream. I've seen this use proxy servers, custom DNS-like things, you name it.
I got a response from the team providing "redis as a service"... the summary is as follows:
The advertised hosts when receiving a MOVED response are correct [...] the library should not use IPs from “CLUSTER NODES” responses for its own routing. We explicitly configure internal IPs for the Redis routing so that the sharding and replication does not go out the OpenShift network. We then expose different routing for client libraries as these need to be routable from outside the network. [...] The documentation [...] states “Note that normally clients willing to fetch the map between Cluster hash slots and node addresses should use CLUSTER SLOTS instead.” The output of that command provides the hostname as first routing priority, and returns correct hostnames [...] We will not be able to make the IPs routable from outside the OpenShift cluster
The output of CLUSTER SHARDS (7.x successor of CLUSTER SLOTS) is:
slots
0
5460
nodes
id
c9ef1500c2acc37273e5b04275f7e5620b796fff
tls-port
443
ip
10.42.0.1
endpoint
redis-7c2b3133-0bf4-0.redis.example
hostname
redis-7c2b3133-0bf4-0.redis.example
role
master
replication-offset
0
health
online
slots
10923
16383
nodes
id
753343875ffe163a64284b36d7743861f2ef1506
tls-port
443
ip
10.42.0.3
endpoint
redis-7c2b3133-0bf4-2.redis.example
hostname
redis-7c2b3133-0bf4-2.redis.example
role
master
replication-offset
0
health
online
slots
5461
10922
nodes
id
8992b30e1499e2cf0bc86609d752c4f3f49c2199
tls-port
443
ip
10.42.0.2
endpoint
redis-7c2b3133-0bf4-1.redis.example
hostname
redis-7c2b3133-0bf4-1.redis.example
role
master
replication-offset
0
health
online
Could you see a possible future for SE.R supporting such a topology / configuration?
Well, we're making progress, at least. Yes, there is potential for us to migrate to CLUSTER SLOTS, but it'll take a bit of testing and effort. Basically, CLUSTER NODES was the original and therefore most widely available version of topology discovery. Moving to a newer version gets tricky because of server Vs client level, but CLUSTER SLOTS is almost as old as CLUSTER NODES so is almost certainly fine (IIRC both were pre-GA of 3.0). Migrating to CLUSTER SHARDS - probably not an option yet, and it looks like CLUSTER SLOTS is documented as supporting host names from v7, so: we're probably fine there.
But: this is at least actionable and makes reasonable sense. I don't think I can have much influence on the provider, but I suspect they may have similar problems with several client libraries, at least the cluster-aware ones, but: this seems pragmatic and I'm not going to fight making such a change. We might need to think about the possible matrix of server versions, host/IP configs, etc - the law of unintended consequences means that any such change that fixes some people: may also break others.
/Cc @philon-msft in case using hosts instead of IP routing might be useful, if/when we do this.
See edit for my proposal
Yes, CLUSTER SLOTS is returning the host names (together with optional, non-routable IP adresses), so seems to be the better choice:
0 0 0
1 5460
2 0 redis-7c2b3133-0bf4-0.redis.example
1 443
2 c9ef1500c2acc37273e5b04275f7e5620b796fff
3 ip: 10.42.0.1
1 0 5461
1 10922
2 0 redis-7c2b3133-0bf4-1.redis.example
1 443
2 8992b30e1499e2cf0bc86609d752c4f3f49c2199
3 ip: 10.42.0.2
2 0 10923
1 16383
2 0 redis-7c2b3133-0bf4-2.redis.example
1 443
2 753343875ffe163a64284b36d7743861f2ef1506
3 ip: 10.42.0.3
Side note: in my case, all hostnames resolve to the same routable IP and require SNI.
Update; had a chat with the Redis Ltd folks today; the current consensus seems to be that this scenario is not a valid configuration, and that the host and advertised IP should map to the same. Still asking for more input from the Redis Ltd folks.
Reading the CLUSTER SLOTS documentation, and comparing the possible outputs, I am of the opinion that the "preferred endpoint" (the 0th element entry above) is meaningful sufficient to our purposes. Given that, I think I disagree with the Redis Ltd folks from yesterday. I think I'm going to try to advance this on the above basis, but probably with a new option along the lines of:
ClusterRouting: {Default,IP,Host}
where Default (the default) would use the "preferred endpoint", ip prefers ip (using host only if no ip available), Host prefers host (using ip only if no host available). Since CLUSTER SLOTS usually returns ip as the preferred endpoint, this represents no functional change in any relevant scenarios.
The above is open to discussion - that's just my current position!
ClusterRouting: {Default,IP,Host} (or even 'only' respecting the preferred order) seems good. If, for any reason, hosts / IPs are not reachable, an option to ForceX or XOnly would prevent polluting the candidate list... I guess this would already be possible using the mentioned Tunnel API?