reth icon indicating copy to clipboard operation
reth copied to clipboard

Update peer records in the network peerset

Open mattsse opened this issue 2 years ago • 5 comments

Describe the feature

the discovery sources (dns,discv4) emit discovered records.

since the peerset stores all the records, it should tell the discv4 service to find the node to:

  • check if it still exists
  • update its address

the bond duration should be 24h, after which an update should be performed.

Additional context

ref #1414

No response

mattsse avatar Feb 16 '23 18:02 mattsse

If the discovery service is keeping track of the bond duration and performs necessary checks, perhaps it could let the network state know of a failing check through an event?

mempirate avatar Feb 16 '23 19:02 mempirate

If the discovery service is keeping track of the bond duration

it does so only for nodes in the KAD table. we have multiple discovery sources (discv4, DNS, eventually discv5) that all stream nodes to the peermanager. so the peermanager should ask the disc service to lookup a node from time to time and remove if it does not respond.

mattsse avatar Feb 16 '23 21:02 mattsse

Sorry don't think I follow: What may happen? Are we not clearing up the peerset in case a peer is no longe present or has changed their adddress (when?)

gakonst avatar Feb 22 '23 21:02 gakonst

@mattsse I can help with this. Started to look at high-level implementation strategies and I think it would be a clean SoC if we make Discovery responsible for this, because it contains all the discovery protocols (now and in the future) and keeps track of discovered_nodes.

The current behaviour is that it deletes an entry when a peer gets removed from the Kademlia table in discv4. This should be changed and it should keep track of all discovered peers, then use some kind of heartbeat to periodically check those peers.

When a peer address changed, or it is unresponsive, it can notify the PeersManager through an event, and the PeersManager can then take appropriate action. Wdyt?

Sorry don't think I follow: What may happen? Are we not clearing up the peerset in case a peer is no longe present or has changed their adddress (when?)

@gakonst: yes, the peerset is currently not being updated when any of those things happen, so it's possibly maintaining and dialling a lot of stale peers. Currently some of the reth nodes are at >70,000 peers in the set after a week or 2 of uptime (still growing)

mempirate avatar Jun 08 '23 12:06 mempirate

I think it would be a clean SoC if we make Discovery responsible for this, because it contains all the discovery protocols (now and in the future) and keeps track of discovered_nodes.

I think the flow should be PeersManager -> Discover -> discv4, however this is a bit tricky because not all nodes are discovered via discv4.

he current behaviour is that it deletes an entry when a peer gets removed from the Kademlia table in discv4. This should be changed and it should keep track of all discovered peers, then use some kind of heartbeat to periodically check those peers.

agree, that sounds right

perhaps we need to have an additional Instant in the peer type that tracks the timestamp we were able to establish a connection, and kick peers after a certain time. This is similar to the backoff counter which currently also is incremented when a peer is down (io connection error).

I think we be a bit more liberal with how these kinds of peers are evicted

mattsse avatar Jun 08 '23 14:06 mattsse

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] avatar Sep 05 '23 01:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Sep 13 '23 01:09 github-actions[bot]