zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

Fix haveDelivered wrong implementation.

Open KimRasak opened this issue 5 years ago • 6 comments

Original implementation and mine

  • The original implementation of QuorumCnxManager::haveDelivered returns true once it finds a queue is empty, and returns false if all queues aren't empty.
  • My implementation returns false once it finds a queue not empty, and returns true if all queues are emtpy.

Reasons

There are 2 reasons why I think the implementation of QuorumCnxManager::haveDelivered is wrong.

1. comment mismatch

The comment of QuorumCnxManager::haveDelivered says:

/** * Check if all queues are empty, indicating that all messages have been delivered. */

As the comment says, the method should return true if all queues are empty, But the implementation returns true if one of the queues is empty.

2. The methods functionality

In FastLeaderElection::lookForLeader:

if(manager.haveDelivered()){
    sendNotifications();
} else {
    manager.connectAll();
}

I believe the code wants to do the following:

  • If The manager has sent all messages(but receives nothing), send notifications again.
  • Otherwise, if some queue of queueSendMap isn't empty, the connection to that server must have broken down or haven't been established.

Therefore, manager.haveDelivered() should check whether some queue isn't empty. (Note that manager.connectAllonly tries connecting all servers. It skips the servers that "I"(the server itself) have connected to.)

KimRasak avatar Dec 06 '19 11:12 KimRasak

Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1690/

Build result: FAILURE

[...truncated 905.57 KB...][JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/pom.xml to org.apache.zookeeper/parent/3.6.0-SNAPSHOT/parent-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/zookeeper-recipes-lock/pom.xml to org.apache.zookeeper/zookeeper-recipes-lock/3.6.0-SNAPSHOT/zookeeper-recipes-lock-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/pom.xml to org.apache.zookeeper/zookeeper-contrib/3.6.0-SNAPSHOT/zookeeper-contrib-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-client/pom.xml to org.apache.zookeeper/zookeeper-client/3.6.0-SNAPSHOT/zookeeper-client-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/pom.xml to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-tests.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-tests.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-sources.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-sources.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-jute/target/zookeeper-jute-3.6.0-SNAPSHOT-javadoc.jar to org.apache.zookeeper/zookeeper-jute/3.6.0-SNAPSHOT/zookeeper-jute-3.6.0-SNAPSHOT-javadoc.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/zookeeper-recipes-queue/pom.xml to org.apache.zookeeper/zookeeper-recipes-queue/3.6.0-SNAPSHOT/zookeeper-recipes-queue-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/pom.xml to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT-tests.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT-tests.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-docs/target/zookeeper-docs-3.6.0-SNAPSHOT-sources.jar to org.apache.zookeeper/zookeeper-docs/3.6.0-SNAPSHOT/zookeeper-docs-3.6.0-SNAPSHOT-sources.jar[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-recipes/pom.xml to org.apache.zookeeper/zookeeper-recipes/3.6.0-SNAPSHOT/zookeeper-recipes-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-client/zookeeper-client-c/pom.xml to org.apache.zookeeper/zookeeper-client-c/3.6.0-SNAPSHOT/zookeeper-client-c-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/zookeeper-contrib-rest/pom.xml to org.apache.zookeeper/zookeeper-contrib-rest/3.6.0-SNAPSHOT/zookeeper-contrib-rest-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-assembly/pom.xml to org.apache.zookeeper/zookeeper-assembly/3.6.0-SNAPSHOT/zookeeper-assembly-3.6.0-SNAPSHOT.pom[JENKINS] Archiving /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build-maven/zookeeper-contrib/zookeeper-contrib-zooinspector/pom.xml to org.apache.zookeeper/zookeeper-contrib-zooinspector/3.6.0-SNAPSHOT/zookeeper-contrib-zooinspector-3.6.0-SNAPSHOT.pomchannel stopped[SpotBugs] Skipping execution of recorder since overall result is 'FAILURE'Setting status of 555a292de47ac1a54f01073e9a0fe5d8f68e7eb1 to FAILURE with url https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1690/ and message: 'FAILURE 'Using context: JenkinsMaven

asf-ci avatar Dec 06 '19 11:12 asf-ci

I still think there's sth wrong with the QuorumCnxManager.haveDelivered()or

if(manager.haveDelivered()){
  sendNotifications();
} else {
  manager.connectAll();
}

Becase the old implementation won't try to "connect all" as long as the first "sending queue" is empty. But I don't know how to simulate the scene that will triger the problem.

KimRasak avatar Dec 06 '19 13:12 KimRasak

Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1691/

asf-ci avatar Dec 06 '19 13:12 asf-ci

Is is possible that there are 4 servers (s1, s2, s3, s4),

  1. In the beginning, (s2, s3, s4) can establish links with s1, while the network among (s2, s3, s4) is broken and (s2, s3, s4) can't establish links to each other(that is, everyone can only connect to s1).
  2. Because the links between (s2, s1), (s3, s1) and (s4, s1) are healthy, and the first "sending queue" of s2, s3 or s4 is s1, manager.haveDelivered() will always be true. So (s2, s3, s4) won't call manager.connectAll();.
    • The first "sending queue" of s1 is towards s2, which is also healthy.
  3. Now the network among (s2, s3, s4) is healthy again, but is it possible that they won't try to connect to each other any more? And maybe the quorum will not be established?

Established or not?

  • After many rounds of notification exchanging, everyone votes for s4, and from the perspective of s1, s4 will get 4 votes, so s1 will be a follower.
  • and...? will the quorum be established in the end?

KimRasak avatar Dec 06 '19 13:12 KimRasak

@KimRasak

  • The logic only affects the queueSendMap which some queue is empty and others is not empty. In this case, the origin logic enters into sendNotifications(), your fix enters into connectAll(). Look at the method:sendNotifications(), it puts the notification into sendqueue. , then WorkerSender.run poll the notification(QuorumCnxManager#toSend), which also connects to other peers(connectOne(long sid))

  • Now the network among (s2, s3, s4) is healthy again, but is it possible that they won't try to connect to each other any more? And maybe the quorum will not be established?

    I guess not have that possibility. You can use some tool to reproduce some corner case you image(e.g blockade)

maoling avatar Dec 09 '19 02:12 maoling

@KimRasak Is that a scenario that you mention in your latest comments is something that you've seen happened in a real cluster? I'm asking, because I'd like to get confidence in your change before accepting it. You're about to make changes in leader election, so we have to be very careful about that.

Regarding the failing scenario (I think it would be more realistic with odd number of participants, but doesn't really matter.) can we say that all participants will get the most recent notifications from s1, because they all have a healthy connection to it? In which case having live connection between all nodes might not be a requirement of a successful leader election.

anmolnar avatar Dec 28 '19 14:12 anmolnar