NuRaft icon indicating copy to clipboard operation
NuRaft copied to clipboard

NuRaft with kubernetes

Open kishorekrd opened this issue 3 years ago • 5 comments

Hi, I am creating NuRaft cluster using Kubernetes. Each container has local IP address (will change after reboot) and service IP address(persistent across reboots). I initialized NuRaft cluster suing Service IP addresses, but the RPC request comes from the local IP address from peer Nodes. Is there any problem with this configuration? Can I configure a node with multiple IP addresses? Are there any dos and don'ts with kubernetes?

I see that each Node getting connection request from other peer node local IP address and getting established, but after this it is trying to connect to the same peer node with its Service IP address and getting connection errors. This causing flapping and leading to election time outs. Am I missing anything?

kishorekrd avatar Apr 06 '22 21:04 kishorekrd

eBay internally uses NuRaft + Kubernetes and there is no such problem, each container always sees the correct IP address. I guess it is related to your environment settings.

And even though an RPC server sees a local IP address from an incoming request, it should not affect the connection to the peer, as the connection is established based on the endpoint of srv_confg. As long as srv_config has the correct endpoint (i.e., service IP), NuRaft always uses that endpoint regardless of incoming IP address.

greensky00 avatar Apr 09 '22 03:04 greensky00

Thanks for the response. What are the ideal values that you recommend for heart_beat_interval_, election_timeout_lower_bound_, election_timeout_upper_bound_ parameters while running on Kubernetes on a same network?

kishorekrd avatar Apr 11 '22 23:04 kishorekrd

We use around 1 second, 2 seconds, and 4 seconds for heartbeat, lower and upper bounds.

greensky00 avatar Apr 13 '22 04:04 greensky00

is that means, for 2 heart beat misses, it will trigger the election? What is the reason for 1 second heart_beat_interval_? isn't it too big?

kishorekrd avatar Apr 14 '22 00:04 kishorekrd

It should be determined considering two factors:

  1. How often will the leader node actually die?
  2. How often will there be a short network hiccup?

Usually 2) is much more frequent than 1). If we set the above Raft intervals shorter than the network hiccup duration, that will cause lots of unnecessary leader changes even though the previous leader has no problem, which is not a preferred situation as the system becomes unstable.

Hence, the intervals need to be balanced between two factors, and the above numbers are our empirical optimum.

greensky00 avatar Apr 14 '22 16:04 greensky00