consul icon indicating copy to clipboard operation
consul copied to clipboard

Automatically skip self-join

Open nh2 opened this issue 8 years ago • 22 comments

It is unclear to me from the docs of start_join/retry_join whether this value should contain all consul servers that I want in the cluster, or only the other ones (not the current machine).

Looking at a 3-node cluster boostrapping from a automatic deployment, it seems to me that it has to be the latter, because otherwise if retry_join = [machine1, machine2, machine3]:

  • machine1 boots up first, sees only itself, joins itself
  • machine2 and 3 boot at the same time a bit later, try to join each other first, which succeeds

Because the behaviour of retry_join is "Takes a list of addresses to attempt joining every retry_interval until at least one join works" instead of "until all joins work", each none of the machines try to join any other machines.

So due to this race we end up in a situation where we have 2 split clusters {machine1} and {machine2,machine3}, as opposed to the desired {machine1,machine2,machine3}. And it stays that way.

And then we get lots of agent: failed to sync remote state: No cluster leader.

Is this analysis correct?

nh2 avatar Apr 04 '17 00:04 nh2

The race is prevented if you use bootstrap_expect as well since then the cluster will wait to bootstrap until it has at least a certain number of candidates.

highlyunavailable avatar Apr 04 '17 15:04 highlyunavailable

The race is prevented if you use bootstrap_expect as well since then the cluster will wait to bootstrap until it has at least a certain number of candidates.

I should have said; I use bootstrap_expect = 3. Should the race really be prevented by using it? The problem seems to not be that the cluster bootstraps too early, but that due to the behaviour of retry_join stopping at the first successful join, I never get to the expected 3 nodes in a set.

nh2 avatar Apr 04 '17 19:04 nh2

We have nodes that we shut down and start up every other day and we noticed a failure to startup when start_join is used in the configuration, so we removed it and sticked to retry_join. A couple of days ago I spinned up a new 3-node cluster (using retry_join only) and the nodes refused to cluster up, I ended up having to add start_join manually for the initial bootstrapping and afterwards removing it. The usage of these configuration options is rather unclear and it doesn't seem like there's and ideal combination of them for our use case (as I said initially, start_join & retry_join seem to work well unless your leader node shuts down...)

eyalzek avatar Apr 07 '17 10:04 eyalzek

I agree with @nh2, i just ran into similar issue and after i restarted the consul service on all three servers manually, new leader was elected this breaks our automation setup.

I used bootstrap_expect=3 and landed in same scenario. The only difference is i am not using retry_join instead using retry_join_ec2.

sbatchu0108 avatar Apr 07 '17 14:04 sbatchu0108

Seems like skipping the self-join would make this more robust and let you still keep a simple, static configuration that has a full list of servers (vs. enumerating only different servers).

slackpad avatar Apr 12 '17 07:04 slackpad

@slackpad just trying to digest your sentence "skipping the self-join would make this more robust" ? What does that imply by skipping self-join would make this robust ? How does this resolve the split cluster/node issue ?

sbatchu0108 avatar Apr 12 '17 18:04 sbatchu0108

@sbatchu0108 I was thinking if we retry-join not attempt to join itself and stop there then this would make things simpler to configure (you can always list all 3 server IPs) and more robust (you will keep trying until you join some other node). It's still not perfect, especially with more than 3 servers, but since retry-join always tries to join the full list, the last server to attempt the join should always tie everyone together.

slackpad avatar Apr 12 '17 19:04 slackpad

Seems like skipping the self-join would make this more robust and let you still keep a simple, static configuration that has a full list of servers (vs. enumerating only different servers).

Yes, absolutely.

I proposed the documentation change instead of this because it would be asking for less, but if consul can be changed so that it doesn't consider joining itself as a successful bootstrapping operation, that would be much better (people wouldn't have to change their configs, this would be very intuitive, and easier to configure -- imagine all nodes starting from the same machine image in a cloud environment). In that case, I think we can change this issue from an enhancement to a bug.

nh2 avatar Apr 12 '17 22:04 nh2

@slackpad I was thinking if we retry-join not attempt to join itself and stop there then this would make things simpler to configure (you can always list all 3 server IPs) and more robust (you will keep trying until you join some other node)

Ahh got it, agree if consul can be changed then it would make better for everyone to leave their configs as it is. Much appreciated with this proposed change.

sbatchu0108 avatar Apr 14 '17 14:04 sbatchu0108

I wanted to check in 2021 and add in that I've encountered this as well. I'd also like to put a vote for possibly skipping rejoining self and instead joining a different node, if there are other hosts present in retry_join.

ADustyOldMuffin avatar Sep 13 '21 18:09 ADustyOldMuffin

Hello Consul community members,

We would welcome a PR contributed by the community for this usability enhancement!

If you're interested, please comment here so anyone interested can stay informed.

The approach should ensure the following:

  • "Self" entries in start_join and retry_join should be ignored
    • Think through how to robustly assess whether an entry is "self"
  • Emit an error if start_join or retry_join contain no entries other than "self" (e.g., empty list, list with only "self" entries)

Note: we anticipate that this could require more effort to implement and to test/review than it appears.

jkirschner-hashicorp avatar Sep 28 '21 22:09 jkirschner-hashicorp

Hello, I'd be interested in taking this on!

ADustyOldMuffin avatar Sep 29 '21 00:09 ADustyOldMuffin

Hi @ADustyOldMuffin,

Just checking in to see how things are going and whether you have any questions!

jkirschner-hashicorp avatar Oct 21 '21 03:10 jkirschner-hashicorp

No sorry! I started looking into this and didn't get very far while trying to get my environment setup for it. I unfortunately won't have anytime to look at this.

ADustyOldMuffin avatar Oct 24 '21 12:10 ADustyOldMuffin

Hi all, I looked into this a bit today and this is what I found. It looks like joining the cluster is handled here: https://github.com/hashicorp/consul/blob/676ea58bc4f91ca751764f1db5a53c5a9459b062/agent/consul/client.go#L195

Ultimately, Serf is used to facilitate communication between the nodes in a Consul cluster. However, Serf is using memberlist under the hood to actually manage cluster membership.

Based on the above, I think that this fix could potentially be implemented in memberlist, which would effectively apply to Consul after bumping memberlist in Serf, and then Serf in Consul. Does this seem reasonable, or am I missing something here? I'd be interested in looking into this further if this seems like the right path to go down.

aaronraff avatar Mar 13 '22 22:03 aaronraff

Hey @aaronraff, thank you for looking into this.

I think a less invasive fix would be to exclude the current server address in here. The local addr can be injected in the retryJoiner so we can do the check in retryJoinAddrs

dhiaayachi avatar Mar 14 '22 15:03 dhiaayachi

Great idea @dhiaayachi, I'm happy to try this out. I appreciate the code pointers!

aaronraff avatar Mar 16 '22 02:03 aaronraff

Hey @aaronraff

Are you still interested in this fix?

Amier3 avatar Apr 12 '22 16:04 Amier3

Apologies for the delay here. I should have a PR up sometime later this week!

aaronraff avatar Apr 14 '22 01:04 aaronraff

@aaronraff

No problem, look forward to it! :)

Amier3 avatar Apr 14 '22 13:04 Amier3

@Amier3 @aaronraff What is the status at the moment? Love to see this implemented.

Noticed the PR got closed due to @aaronraff not having the CLA, would be a shame if this change gets lost because of that.

bartvollebregt avatar Jun 12 '23 10:06 bartvollebregt

Unfortunately I was falling over this PR. My scenario was: 3 Server bootstrap_expect = 3 retry_join = ["loadbalancer_ip"]

the loadbalancer is pointing to all 3 Server. Unfortunately the loadbalancer has redirected each server to itself, which leaded to consul stopped the retry. Would be cool, if this could be fixed.

chickeaterbanana avatar Aug 10 '25 17:08 chickeaterbanana