consensus-specs icon indicating copy to clipboard operation
consensus-specs copied to clipboard

Gossipsub scoring parameters

Open AgeManning opened this issue 4 years ago • 9 comments

Description

This introduces a set of recommended scoring parameters for the Eth 2.0 network.

A few points to note:

  • I've moved this into its own document (along with the current gossipsub v1.0 parameters) to avoid polluting the p2p spec document.
  • The values within are old values we estimated from mainnet and to keep the document short, I've removed most explanations of their origin. I plan on doing some further analysis on our current testnets and mainnet to update the values that Lighthouse is using (and subsequently this document) and also encourage other client teams to modify values they think may be out.
  • The subnet topics (and sync committee topics) do not have scoring values simply invalids as I've left these TBD.

This is designed to be a base we can work off, so edits/suggestions/feedback welcome

AgeManning avatar Oct 13 '21 01:10 AgeManning

NOTE: this re-orgs the params and also introduces substantive changes to the values, e.g. seen_ttl from #2663

ralexstokes avatar Oct 13 '21 20:10 ralexstokes

CC @tuyennhv

dapplion avatar Oct 14 '21 12:10 dapplion

@AgeManning I'd like to get this merge in. Is the PR as it stands still ready for review?

Do other clients have any input here?

djrtwo avatar Dec 02 '21 19:12 djrtwo

I added this more as an initial template to work from. The values here are the values Lighthouse currently uses, however we are planning on re-examining these values as I think they can be improved (they were originally designed before mainnet launch). If you're happy to treat it as an initial PR and potentially have some updates not too long in the future, then sure.

The other thing, (which Alex pointed out) is that I included the change in #2663 in this PR. As that's not merged yet, we might want to change this. (Lighthouse uses the change in #2633 along with a few other clients, I believe).

AgeManning avatar Dec 02 '21 23:12 AgeManning

Do other clients have any input here?

My main concern would be with penalizing peers with respect to their mesh delivery rate. When we were experimenting with respective parameters to run prysm with we constantly had peers being penalised due to not being able to complete the requisite amount of mesh deliveries in time. This held true across all clients so it wasn't any client specific issue. With the delivery window being 400ms, this is a pretty tight threshold to have all peers in your mesh to deliver x messages in the allotted time period. The natural consequence of this would be that all your mesh peers are those that are geographically closer to you.

Also in the event the network has a drop in participation (ex: 20% of the validators are offline) , all peers in your aggregate/subnet mesh are more liable to be penalised because their mesh message delivery rate is less than the expected threshold. This could lead to a node marking 'good' peers as bad due to this and eventually banning them which could worsen network participation even further.

nisdas avatar Jan 10 '22 07:01 nisdas

Do other clients have any input here?

I have same concern to Prysm. Right now in lodestar, we configure meshMessageDeliveriesWindow as 2000ms and a lot of peers get penalized due to not being able to deliver enough messages in that time window, 400ms is just too tight to us considering the single thread nature in NodeJS - NodeJS suffers from I/O lag when the event loop is busy

twoeths avatar Mar 20 '22 06:03 twoeths

Yes I agree this one is tricky. We wanted to try and modify the scoring system to not use this parameter. The point of the parameter is to prevent nodes from building up score just by replaying the messages back to us, which they can likely do in under a few 100ms. Having a very large value here I think defeats the purpose of the scoring param and we may need to look into alternative measures for handling this.

AgeManning avatar Mar 21 '22 04:03 AgeManning

Yes I agree this one is tricky. We wanted to try and modify the scoring system to not use this parameter.

šŸ‘ for this

The point of the parameter is to prevent nodes from building up score just by replaying the messages back to us, which they can likely do in under a few 100ms

I think in all implementations, once a duplicate message is seen by a peer, the scoring system will not increase mesh_message_deliveries of peer again no matter how many times that peer sends us that same message

twoeths avatar Mar 21 '22 07:03 twoeths

From the v1.1 Specs:

In order to compute Pā‚ƒ, the router maintains a counter that increments whenever a first or near-first message delivery occurs in the topic by a peer in the mesh. A near-first message delivery is a message delivery that occurs while a message has been first received and is being validated or it has been received within a configurable window of validation of first message delivery. The window is configurable but should be small (in the order of milliseconds) to avoid allowing a mesh peer to build score by simply replaying back the messages received by the current router. The parameter has a cap that applies at the time of increment.

AgeManning avatar Mar 22 '22 21:03 AgeManning