nats-architecture-and-design icon indicating copy to clipboard operation
nats-architecture-and-design copied to clipboard

Allow for user customized handling via callback or language specific mechanism

Open scottf opened this issue 3 years ago • 8 comments

Overview

OPTIONALLY.... Provide some mechanism for the user to override providing the list of urls used for connecting or reconnecting to servers.

  • The client would pass along the relevant Options such as bootstrap servers.
  • The client would update the user when it has a new list of discovered servers.

The user would provide a server url list or some way to iterate the list that matches how the client currently goes through the possible list.

As examples, the Java and .NET client refactored their own specific server list handling into an interface, a default implementation, and then provided a way in the Options for the user to provide their own implementation.

Parity Notes

This is not strictly required for parity. It's a nice to have, so can wait until a customer / user asks for it.

Clients and Tools

  • [ ] Go @piotrpio
  • [x] Java @scottf
  • [ ] JavaScript @aricart
  • [x] .Net @scottf
  • [ ] C @levb
  • [ ] Python @wallyqs
  • [ ] Ruby @wallyqs
  • [ ] Rust @Jarema @caspervonb

Other Tasks

  • [ ] docs.nats.io updated
  • [ ] Update client features spreadsheet

Client authors please update with your progress. If you open issues in your own repositories as a result of this request, please link them to this one by pasting the issue URL in a comment or main issue description.

Original Text

Provide the ability to bootstrap the client connection with multiple lists of servers, representative of different regions. This would be useful in the case where clusters are deployed in multiple regions and clients would prefer to connect to the closest region (first list) always unless it fails on all servers in that list / server info at which time it would try from the second list of servers unless it fails all those and then would go to the third list.

For example, consider 3 regions east, central and west. E East Server List [a.b.x.1, a.b.x.2, a.b.x.3] C Central [a.b.y.1,a.b.y.2,a.b.y.3] W West[a.b.z.1,a.b.z.2,a.b.z.3]

The east clients would be configured with these 3 lists in the order of E, C, W but a west client would be configured in W,C,E order.

When connecting, the client would exhaust the first list before trying any in the second list.

scottf avatar Apr 12 '22 18:04 scottf

Is this better managed out of band? There are lots of considerations here- upstream health checking, resolving DNS names at a specified interval, failing over, failing back, etc.

It would be fairly straightforward to implement using Envoy with priority levels:

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/priority

caleblloyd avatar Apr 13 '22 01:04 caleblloyd

imoI for the NATS clients, simpler is better, and for some clients we could add a callback that's invoked to get the next url for customized server selection in connect/reconnect. Longer term we've discussed a high level service/stream API that can could be much more sophisticated with the features @caleblloyd suggested.

ColinSullivan1 avatar Apr 14 '22 00:04 ColinSullivan1

I am very keen on something like the callback Colin mentions.

For me the problem is I get initial list from elsewhere - SRV records, consul etc - and people might want move my clients to another cluster. So they update eg. SRV records but there is no way to rerun a query or update a running client.

I need to periodically or, less ideal, on reconnect be able to update server list on very long running clients I do not directly control.

ripienaar avatar Apr 14 '22 05:04 ripienaar

If anything this should be callback that replaces the cluster gossip behaviour. That means that if you want to specify a callback the expected behaviour is that cluster updates are ignored (as the authoritative server list is now the responsibility of the callback)- the obtaining of the list could be an expensive situation, and in some cases possibly affected by the same network outage that is requiring the services to use a different cluster.

aricart avatar May 16 '22 13:05 aricart

+1 for callback. Will allow custom implementations, including for thing like specific dns resolution

scottf avatar May 16 '22 14:05 scottf

Not a feature

marthaCP avatar Oct 17 '22 20:10 marthaCP

@marthaCP why is this reopened?

aricart avatar Nov 08 '22 13:11 aricart

I meant to update the title for the issue. Scott said it was still open. Maybe we should discuss at the call tomorrow (11/9/22).

marthaCP avatar Nov 08 '22 15:11 marthaCP