lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Improve Peer Manager Scoring Tests

Open AgeManning opened this issue 3 years ago • 4 comments

Description

The PeerManager (an object that manages the peers Lighthouse is connected to) has recently added some more complex logic around handling peers, specifically pruning. There are a number of edge cases in this logic that remain untested. The tests we currently have a quite rigid and could be improved with the use of quickcheck.

There is a lot of undocumented complex logic in this section of the code. There is some low-hanging fruit however.

Brief Overview of PeerManager Logic

Peers connect to lighthouse stochastically (if we have open tcp/udp ports). The PeerManager uses a preset target-peer-count. This is the desired average number of peers we want to maintain a connection to. As more and more peers connect to us, we permit 10% (or so) more peers above this target. Every 30 seconds the peer manager undergoes a heartbeat: https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/peer_manager/mod.rs#L1114. In this heartbeat, we prune the excess peers, i.e those peers that are above our target-peer-count (https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/peer_manager/mod.rs#L904). After the heartbeat we expect to disconnect excess peers.

We have an internal scoring mechanism that gives us an ability to rate peers. We prune peers based on their score and a number of other heuristics.

There are some tests to check if we prune these peers correctly given the complex conditions. See here: https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/peer_manager/mod.rs#L1259 (and below).

It might be nicer to use quickcheck (some examples of its use are: https://github.com/sigp/discv5/blob/master/src/kbucket/bucket.rs#L1380 and https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/tests/smoke.rs#L188)

An example would be to inject a random number of peers, and see if after a heartbeat the peer count is the target peer count. Another test could be to inject a random number of peers, with random scores and ensure that we prune the worst ones.

A further, more sophisticated task would be to attempt to generalize the tests that already exist using quickcheck.

AgeManning avatar Mar 16 '22 06:03 AgeManning

Hi, I'm interested in this issue. 🙂

As my first step in learning about this issue, I ran code coverage. The following is result about PeerManager::prune_excess_peers(). I believe this could be useful.

image

ackintosh avatar May 30 '22 07:05 ackintosh

Nice. Do you have specific questions about this or anything I can do to help extend the tests?

AgeManning avatar May 31 '22 01:05 AgeManning

Thank you!

I have read about the implementation of pruning, but I can not understand what the attestation subnet and sync committee subnet are. (I have googled but no luck.) 💦

https://github.com/sigp/lighthouse/blob/df40700ddd2dcc3c73859cc3f8e315eab899d87c/beacon_node/lighthouse_network/src/types/subnet.rs#L10-L15

ackintosh avatar Jun 01 '22 17:06 ackintosh

These are ethereum-2 terms.

High level overview can be seen here: https://kb.beaconcha.in/attestation

Technical specifications are:

  • https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/validator.md#attesting
  • https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#the-gossip-domain-gossipsub

Sync committees were introduced in Altair. Can see here: https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/validator.md#sync-committee

Basically these are objects that we propagate over a gossipsub network on different topics.

AgeManning avatar Jun 03 '22 04:06 AgeManning

assuming this closed by #3248

divagant-martian avatar Oct 20 '22 19:10 divagant-martian