curator icon indicating copy to clipboard operation
curator copied to clipboard

[Fix] Curator-444

Open borjab opened this issue 8 years ago • 11 comments

This ensures that not two nodes are simultaneusly as leaders.

If setLeadership(false) is redundant no event would be reported anyway so it is quite safe.

borjab avatar Nov 28 '17 10:11 borjab

Fix now includes test to reproduce error.

borjab avatar Nov 28 '17 16:11 borjab

Thanks for the feedback. You are right that the test has a lot of room for improvement. I will work on the tests as requested.

borjab avatar Dec 04 '17 08:12 borjab

Thanks!

cammckenzie avatar Dec 04 '17 22:12 cammckenzie

We've got continuous issues with that bug (two critical nodes becoming leaders simultaneously begin to damage the work of each other), and I would like this work to be continued, though I see it lost attention some time ago. Right now I'm looking at the code trying to figure out where exactly it's broken and I found this line quite strange https://github.com/apache/curator/pull/243/files#diff-edf135a1ae8f33bd78d8371b25f2d5a6R544 IE, we got 3 children:

0 1 2

And ourIndex = 2, why would we add a watcher for the child 1?

sortedChildren.get(ourIndex - 1)

egor-ryashin avatar Nov 20 '18 21:11 egor-ryashin

This is just done for efficiency. Each client involved in the leader election only cares when the child before it disconnects, because that implies that it should now become leader. If any of the other children disconnect, the client doesn't care because it doesn't affect whether it should become leader or not.

cammckenzie avatar Nov 20 '18 22:11 cammckenzie

And ourIndex = 2, why would we add a watcher for the child 1?

This prevents thundering herds when ZNodes are deleted. A given client only needs to watch the ZNode previous to it thereby it only gets notified when it will be the next lock/leader holder.

Randgalt avatar Nov 20 '18 22:11 Randgalt

@borjab and @egor-ryashin I just ran the test in this PR multiple times with the fix in LeaderLatch commented out and the tests always pass. It's important to note that if you don't use the default ConnectionStateErrorPolicy and/or elsewhere don't exit your locks, etc. when you receive SUSPENDED you can get in this state. Even then with long GC times there is always the possibility of multiple lock/leader holders. See https://cwiki.apache.org/confluence/display/CURATOR/TN10

Randgalt avatar Dec 11 '18 01:12 Randgalt

@Randgalt I've actually seen 2 leaders working several minutes till I restart them. ~My guess, right now, that can happen when the leader has lost connection for a spell. For example, for children: A B here B watches A, while A doesn't watch anything, I guess. If A gets a connectivity issue, B becomes a leader and set no watches, then A get through and becomes the leader again, due to the sorting rule, while B doesn't receive any notification about it. Haven't had the time to prove it yet though.~ (More thourough looking through code and some tests shows it's not the root cause.)

egor-ryashin avatar Dec 11 '18 11:12 egor-ryashin

@egor-ryashin LeaderLatch calls setLeadership(false); when the connection is lost. LeaderSelector is a bit more complicated but if you use the recommended LeaderSelectorListenerAdapter the active leader will get interrupted when the connection is lost which will result in a release of the its internal lock. In both cases the lock node (A in your example) gets deleted on connection loss.

Randgalt avatar Dec 11 '18 13:12 Randgalt

Please read this tech note and see if it applies: https://cwiki.apache.org/confluence/display/CURATOR/TN14

Randgalt avatar Dec 11 '18 19:12 Randgalt

@Randgalt no, it doesn't recover, only a restart helps.

egor-ryashin avatar Dec 15 '18 18:12 egor-ryashin

Closed as stale. There're many changes since this patch was made in the first place. Feel free to resubmit it if it's still relevant.

tisonkun avatar Sep 06 '22 13:09 tisonkun