Akka.Cluster.Discovery
Akka.Cluster.Discovery copied to clipboard
It takes Cluster Singleton 1 minute to move to another node
- Consul discovery, the settings are:
akka.cluster {
discovery {
provider = akka.cluster.discovery.consul
consul {
listener-url = "http://127.0.0.1:8500"
class = "Akka.Cluster.Discovery.Consul.ConsulDiscoveryService, Akka.Cluster.Discovery.Consul"
dispatcher = "consul-dispatcher"
alive-interval = 10s
alive-timeout = 1m
refresh-interval = 1m
join-retries = 3
lock-retry-interval = 250ms
datacenter = "dc"
token = ""
wait-time = 30s
}
}
}
- Three nodes cluster, a singleton is running on a node.
- Kill the node on which the singleton is running.
- A new singleton is launched after ~1 minute delay, which is unacceptable, the docs promise that it should take few seconds at most.
Cluster singleton migration depends on the time of down node detection - if node is just unreachable, we cannot assume it's dead, since it may be just temporary network issue and we don't want to end with 2 singletons. Therefore we need to determine if a node is down:
- In graceful scenario it's fast (as downing node can announce this to others).
- In hard failure it's slow, since the rest of the cluster must detect if node is actually dead or if it just disconnected for some reason and will come back up shortly. And this takes time.
Docs probably refer to time required to migrate, once a down node has been detected. In case of consul cluster discovery, you can play with alive-timeout
and refresh-interval
settings to try to lower that time frame. However if I'm right consul itself requires at least 30-60s to detect an unhealty node.