libcluster icon indicating copy to clipboard operation
libcluster copied to clipboard

DNS Poll - max stable Cluster Size = max DNS Entry Response Count

Open bmalum opened this issue 1 year ago • 1 comments

Steps to reproduce

  • Configuration Used
config :libcluster,
  debug: true,
  topologies: [
    dns: [
      strategy: Cluster.Strategy.DNSPoll,
      config: [
        poll_interval: 10_000,
        query: "appname.something",
        node_basename: "some-container"
      ]
    ]
  ]
  • Strategy Used Cluster.Strategy.DNSPoll
  • Errors/Incorrect Behaviour Encountered Maximum stable Cluster Size is the number of DNS results returned.

Description of issue

  • What are the expected results? DNS query, I would not expect nodes to be removed if not in the DNS response. I would expect to trust the disconnect if a node times out with net_ticktime and is not actively being removed. For example, if you have 15 nodes and DNS replies with 5 random node IPs, the cluster will become unstable.

  • Is the documentation incorrect? Documentation does not mention that nodes will be removed when no longer in DNS. It just says:

this strategy will periodically poll DNS and connect all nodes it finds.

Should we introduce a config flag to turn off removing nodes?

bmalum avatar Mar 22 '23 08:03 bmalum

I'd be open to accepting a PR that makes removing nodes in this strategy optional based on a flag, something like prune: false to disable pruning the node list. I believe there was a reason we actively prune nodes when the source of data for the strategy (e.g. DNS in this case, but could be any system providing service discovery) no longer reports a node as being part of the cluster, but I can't recall the specifics at the moment, but it was a specific choice. libcluster is largely deferring to the source registry to tell us what nodes belong in the cluster. In the case of DNS, it is unusual for a node to disappear from DNS unless it is being permanently removed, but I can imagine scenarios where this might happen, such as under k8s or some other orchestrator that uses DNS for service discovery.

bitwalker avatar Jun 22 '23 17:06 bitwalker