aptos-core
aptos-core copied to clipboard
[Network] Topology awareness
Differentiate upstream, downstream, and peer Each node should keep track of how far it is from the validator core and relay that downstream. To do that:
- Create a new service that tracks how far upstream peers are from
- Upstream peers announce their distance upon new connections with the peer as well as when their distance changes
- Ideally, this data would be signed with a series of signatures from the validator to the leaf nodes, this would allow fullnodes to prove their distance -- or at least make it very hard to fake -- in this, we should expect that distances might be updated based upon a global timeout for those upstream signatures
@davidiw: Instead of deploying a new service I think this is something that could be rolled into the health checker pretty easily. For example:
- Instead of sending empty messages, nodes could send their distance whenever pinged.
- If a node is pinged and asked for its distance, it could return the minimum of all peer distances +1.
- Once we've done this, we could expose this information to other applications, e.g., mempool and state sync (perhaps via the
PeerMetadataStoragestruct or something similar). These applications could then use this information (e.g., mempool could forward transactions via the peer with the lowest distance). - At first, we could do this without signatures (e.g., just trust the peers for now), and then make it more complicated with signatures and verification, etc.
For what it's worth, I think this is an excellent onboarding task for a new member. Happy to help if it makes sense.
The challenge is that we're planning on making the health checker only do something if there is no activity.
I think the rest is aligned. I think we may need to move faster here, because I worry it makes any sort of real deployment questionable.
This issue is stale because it has been open 45 days with no activity. Remove the stale label or comment - otherwise this will be closed in 15 days.
This issue is stale because it has been open 45 days with no activity. Remove the stale label or comment - otherwise this will be closed in 15 days.