consensus-specs
consensus-specs copied to clipboard
Gossipsub scoring parameters
Description
This introduces a set of recommended scoring parameters for the Eth 2.0 network.
A few points to note:
- I've moved this into its own document (along with the current gossipsub v1.0 parameters) to avoid polluting the p2p spec document.
- The values within are old values we estimated from mainnet and to keep the document short, I've removed most explanations of their origin. I plan on doing some further analysis on our current testnets and mainnet to update the values that Lighthouse is using (and subsequently this document) and also encourage other client teams to modify values they think may be out.
- The subnet topics (and sync committee topics) do not have scoring values simply invalids as I've left these TBD.
This is designed to be a base we can work off, so edits/suggestions/feedback welcome
NOTE: this re-orgs the params and also introduces substantive changes to the values, e.g. seen_ttl from #2663
CC @tuyennhv
@AgeManning I'd like to get this merge in. Is the PR as it stands still ready for review?
Do other clients have any input here?
I added this more as an initial template to work from. The values here are the values Lighthouse currently uses, however we are planning on re-examining these values as I think they can be improved (they were originally designed before mainnet launch). If you're happy to treat it as an initial PR and potentially have some updates not too long in the future, then sure.
The other thing, (which Alex pointed out) is that I included the change in #2663 in this PR. As that's not merged yet, we might want to change this. (Lighthouse uses the change in #2633 along with a few other clients, I believe).
Do other clients have any input here?
My main concern would be with penalizing peers with respect to their mesh delivery rate. When we were experimenting with respective parameters to run prysm with we constantly had peers being penalised due to not being able to complete the requisite amount of mesh deliveries in time. This held true across all clients so it wasn't any client specific issue. With the delivery window being 400ms, this is a pretty tight threshold to have all peers in your mesh to deliver x messages in the allotted time period. The natural consequence of this would be that all your mesh peers are those that are geographically closer to you.
Also in the event the network has a drop in participation (ex: 20% of the validators are offline) , all peers in your aggregate/subnet mesh are more liable to be penalised because their mesh message delivery rate is less than the expected threshold. This could lead to a node marking 'good' peers as bad due to this and eventually banning them which could worsen network participation even further.
Do other clients have any input here?
I have same concern to Prysm. Right now in lodestar, we configure meshMessageDeliveriesWindow as 2000ms and a lot of peers get penalized due to not being able to deliver enough messages in that time window, 400ms is just too tight to us considering the single thread nature in NodeJS - NodeJS suffers from I/O lag when the event loop is busy
Yes I agree this one is tricky. We wanted to try and modify the scoring system to not use this parameter. The point of the parameter is to prevent nodes from building up score just by replaying the messages back to us, which they can likely do in under a few 100ms. Having a very large value here I think defeats the purpose of the scoring param and we may need to look into alternative measures for handling this.
Yes I agree this one is tricky. We wanted to try and modify the scoring system to not use this parameter.
š for this
The point of the parameter is to prevent nodes from building up score just by replaying the messages back to us, which they can likely do in under a few 100ms
I think in all implementations, once a duplicate message is seen by a peer, the scoring system will not increase mesh_message_deliveries of peer again no matter how many times that peer sends us that same message
From the v1.1 Specs:
In order to compute Pā, the router maintains a counter that increments whenever a first or near-first message delivery occurs in the topic by a peer in the mesh. A near-first message delivery is a message delivery that occurs while a message has been first received and is being validated or it has been received within a configurable window of validation of first message delivery. The window is configurable but should be small (in the order of milliseconds) to avoid allowing a mesh peer to build score by simply replaying back the messages received by the current router. The parameter has a cap that applies at the time of increment.