gossipmonger icon indicating copy to clipboard operation
gossipmonger copied to clipboard

Peers are pinged even after dead

Open webglider opened this issue 9 years ago • 4 comments

I created 3 peers on different ports (using gossipmonger-tcp-transport listeners as transport). Lets call them A,B,C

A was given an empty seed set B was given an seed set as [A] C was given seed set as [B]

Everything runs fine when all the peers a live. Then, I close peer C by terminating it's process (Ctrl-C) A and B start reporting connection errors as expected After some time they recognize peer C is dead (the 'peer dead' event is fired on both) But even after they have recognized that the peer is dead, I continue to see periodic connection error reports.

Shouldn't peers stop trying to ping peers which they think are dead? (MIN_LIVE_PEERS is set to the default 1, so that should not be a problem)

If they keep trying to ping even peers which are dead, what is the point of even calling them dead? That means a peer can never leave the cluster

webglider avatar Jul 17 '15 01:07 webglider

Ok, I checked the source code. There is a probability with which dead nodes are tried. Is this necessary to ensure consistency, or would it be okay to get rid of this?

webglider avatar Jul 17 '15 03:07 webglider

Hi @webglider,

First, thank you for a great description of what you're seeing. It was easy to understand what you described.

You're right, the current code does not allow a peer to leave a cluster. I think you've found a missing feature. The retrying of dead peers was implemented because that's what was in the algorithm I read in the paper. I never got around to implementing peers cleanly leaving the cluster.

I don't think we should get rid of the retrying the dead nodes feature. It allows the cluster to recover from network partitions, partial failures, etc.

I think it would be possible to implement leaving the cluster by adding some sort of delete method to the storage api that would get rid of the desired peer.

tristanls avatar Jul 17 '15 22:07 tristanls

@tristanls Yeah, I guess a method to allow peers to deliberately leave the network could be implemented. Right now I guess that can be done with a small hack - we could program each peer to recognize a special tombstone key,value pair maybe ('i am', 'dead') :P , so that whenever it receives such an update from another peer, it forcefully deletes the peer from storage. So if a peer wants to leave intentionally, it will update this tombstone key value pair locally, wait for a few seconds and then disconnect. I guess that will be sufficient to ensure that the update will propagate through gossip to all other peers.

But, what if a rough peer doesn't follow the standard protocol for leaving the cluster and simply terminates his process and shuts down his computer. In such cases the redundancy will remain. Is there a way to find such instances and delete them from the cluster? (maybe if a peer cannot be reached by many peers it is considered as to have left)

That leads me to another question regarding the phi accrual failure detection. I haven't read the details of how exactly the phi value is computed, but looking at the source code I could see that it was being computed using last time and interval related data. If I am not wrong this data is updated only when a peer receives a digest or a delta from another peer. So now lets imagine an idle network (i.e no updates are happening as of now), where peers are randomly exchanging digests. This means peer A's phi value of peer B will be updated only when A receives a digest from B. If the network is large, isn't there a chance that this could take considerably long? This means the failure detection information is not being gossipped i.e the peers are not working together in collaboration to decide the failure of a node. Each node independently decides failure of all the other nodes. Am I right regarding this? or have I missed something in the implementation? Shouldn't peers be gossiping about their phi values of other peers, and then they decide failure using this collective information?

webglider avatar Jul 18 '15 01:07 webglider

Hey,

I've thought about this for a bit. I think a place where to introduce leaving the cluster (at least at first) is through the storage abstraction https://github.com/tristanls/gossipmonger#gossipmonger-storage. If storage afforded something like storage.remove(id), which would remove the peer from storage, the next time around gossipmonger calling storage.livePeers() or storage.deadPeers() the removed peer would not be included in the results.

With that in place, the protocol for leaving the cluster would be:

  1. shut down the peer (no need to be clean) permanently.
  2. wait until everyone declares the peer dead (if we were to remove the peer prior to it being considered dead, we would could enter a state where we advertise a dead peer in the middle of removal)
  3. communicate removal of that particular id (this can be some 'admin' gossip key)
  4. when peers receive the admin message, they remove the peer use storage.remove(id)

The above protocol is very much a type of "after-the-fact clean up" protocol. In essence, you can shut down the peers and let them go dead.. and at some point, you can do the admin message that cleans up the ones you want to remove.

Regarding phi accrual failure detection. If you take a look at https://github.com/tristanls/gossipmonger/blob/master/index.js#L327, you'll see that we calculate phi when we locally attempt to connect to a peer (gossipmonger.gossip()), and not when the peer connects to us. Since, we constantly attempt to gossipmonger.gossip() https://github.com/tristanls/gossipmonger/blob/master/index.js#L349, we continuously update phi locally. So, an idle network would result in our local node thinking that all the peers are dead. You are correct that each node independently decides failure of all the other nodes. If the network is large, we might not connect to everyone, and therefore declare them dead, but that's why we continue trying dead peers, in order to test our local assumptions about the remote state of the peer.

As to peers gossiping their phi values to other peers, it sounds interesting, but I have not explored that path.

tristanls avatar Jul 26 '15 12:07 tristanls