docker-vernemq icon indicating copy to clipboard operation
docker-vernemq copied to clipboard

Helm: headless-service might require publishNotReadyAddresses: true

Open micw opened this issue 4 years ago • 4 comments

Hello,

as far as I understand, the nodes on kubernetes uses the DNS names from headless-service for communication. The DNS for this service is only available if the node is healty. It might be that DNS is required for communication before the node gets healthy (so that other nodes can sync with it). If so, the headless-service should have "publishNotReadyAddresses: true" set.

Kind regards, Michael.

micw avatar Dec 22 '20 14:12 micw

@micw thanks... is this basically the same issue as here https://github.com/vernemq/docker-vernemq/issues/261? Do you have a PR?

ioolkos avatar Dec 28 '20 18:12 ioolkos

@ioolkos I can do one (it's just a one-liner) but I'm not 100% sure if that is one of the issues causing kubernetes vmq-clusters to fail (I'm still on https://github.com/vernemq/vernemq/issues/1698).

Without the publishNotReadyAddresses: true option, the behaviour is the following:

  • Cluster has 3 nodes (node1, node2, node3)
  • each node knows the other nodes by DNS name
  • if node 1 goes down, the DNS name "node1" is removed
  • If node comes up again, the DNS name "node1" is not available until node1 gets "OK" state (which is checked via it's health endpoint)
  • node1 is still able to connect the other 2 nodes via DNS but the other nodes can only answer to it's IP address

So if that option is required or not hevily depends on the internals of cluster building until the "OK" state. If you have deep insight, you might be able to answer this - I do not ;-)

Kind regards, Michael.

micw avatar Dec 28 '20 19:12 micw

@ioolkos This isn't the same issue as #261

SerialVelocity avatar Feb 19 '21 11:02 SerialVelocity

I have created PR #367 to add the suggested change. I have tested the fix on our cluster and for us it works. The cluster now joins up correctly after forcing restarts of the pods. What we first saw is that because it takes quite a while for the pods to become healthy and the pods can't connect to each other all that time, they form clusters on their own.

Ghkuijer avatar Oct 17 '23 06:10 Ghkuijer