go-spacemesh sync: get block validity from peers

Motivation

Closes #3003

Changes

syncer: move state sync code to a separate file
moves opinion processing from fetcher to syncer pkg
add the following data to opinions response
- epoch weight
- latest verified layer
- aggregated hash
- block validity

opinions amongst peers are sorted in the following order

highest epoch weight
highest verified layer
existence of certificate

block validity is saved directly to database for tortoise to consume.

Sep 20 '22 04:09 countvonzero

bors try

Sep 20 '22 05:09 countvonzero

try

Build succeeded:

ci-stage2

Sep 20 '22 06:09 bors[bot]

validity is imported from the same peer or each layer from different? it won't work in the second case

Sep 20 '22 14:09 dshulyak

are certificates downloaded for all layers? not only hdist?

Sep 20 '22 15:09 dshulyak

validity is imported from the same peer or each layer from different? it won't work in the second case

as implemented it can be from different peer for each layer. it will be ideal if we can import it from the same peer. but A. we don't have a way to retain peer now. https://github.com/spacemeshos/go-spacemesh/issues/3551 B. even if we can retain peer, there is still cases where the peer can drop from the network anytime. for B, my plan is to switch to a different peer if the mesh hash is consistent.

in general, if the node is syncing from peer X and Y, both have the same accumulated hash in the mesh. what is the issue you see that to import validity from X for layer N and from Y for layer N+1?

are certificates downloaded for all layers? not only hdist?

yes. for all layers for now. https://github.com/spacemeshos/go-spacemesh/blob/e55b88add994aea08ecdc9271212f9e8b7fa35ba/syncer/state_syncer.go#L223 there is an attempt to do this, but mesh.ProcessLayer()/trtl's last layer usually is the last synced layer. this is the reason why we cannot prune the certificate yet in the network, as discussed in slack https://spacemesh.slack.com/archives/C89QJJY3G/p1661405377244839?thread_ts=1661281388.797769&cid=C89QJJY3G

we need to fix https://github.com/spacemeshos/go-spacemesh/issues/2921 before we can start pruning certificate and that a new node only need certificate at the last hdist layers.

Sep 21 '22 02:09 countvonzero

we don't have a way to retain peer now. https://github.com/spacemeshos/go-spacemesh/issues/3551

i don't quite understand why there is a dependency on that. it is completely up to sync which peers to use. ofcourse they may go away unexpectedly, as you noted

in general, if the node is syncing from peer X and Y, both have the same accumulated hash in the mesh.

if it downloads opinion from the peer that atleast communicates same hash it should work. but otherwise, even if one opinion somewhere in the middle is inconsistent, verifying tortoise will be stuck

Sep 21 '22 04:09 dshulyak

we don't have a way to retain peer now. #3551

i don't quite understand why there is a dependency on that. it is completely up to sync which peers to use.

so i don't fully understand how p2p module manage peers and probably assume incorrectly.

let's say the node finds peer X has a different hash at layer N, it wants to find at which layer it starts having differences. i assume at this point we want to make best effort to retain peer X until the node gets all relevant data from X.

last time we discussed this in the dev sync, your suggestion was to use gossipsub scoring system to keep this peer around. but since fetcher gets peers from p2p/bootstrap, which uses EventSpacemeshPeer to add/rm from the peer list, i am not sure gossipsub's scoring system can affect that.

in general, if the node is syncing from peer X and Y, both have the same accumulated hash in the mesh.

if it downloads opinion from the peer that atleast communicates same hash it should work. but otherwise, even if one opinion somewhere in the middle is inconsistent, verifying tortoise will be stuck

thanks. i'll make that change in this PR and notify you when its ready

Sep 21 '22 04:09 countvonzero

i don't quite understand why there is a dependency on that. it is completely up to sync which peers to use.

@dshulyak i didn't explain this well. yes syncer can pick any peer to use. but if we don't have an active connection with the peer, it'll fail here https://github.com/spacemeshos/go-spacemesh/blob/09225a3ada8939ccbe9c3c6cf9f18af91d9f2367/p2p/server/server.go#L141

and i don't want to just dial a peer randomly in this case. i'd rather like an API from the p2p package to do that consistently with its peer management scheme.

Sep 21 '22 09:09 countvonzero

and i don't want to just dial a peer randomly in this case. i'd rather like an API from the p2p package to do that consistently with its peer management scheme.

rpc server won't dial peer if there is no connection. it has this https://github.com/spacemeshos/go-spacemesh/blob/09225a3ada8939ccbe9c3c6cf9f18af91d9f2367/p2p/server/server.go#L154 . yes it will fail if selected peer fails to respond. but it should be expected that peer can go away. node has no control over it

the only thing that be provided by p2p module is to prioritize connections with "better" peers. but it is only relevant if node will be over high watermark for number of peers (currently 100), because at that time it will prune total number of peers down to lower watermark (which is set to 40)

Sep 21 '22 09:09 dshulyak

if it downloads opinion from the peer that atleast communicates same hash it should work. but otherwise, even if one opinion somewhere in the middle is inconsistent, verifying tortoise will be stuck

@dshulyak added a commit that only adopts block validity from peers with the same aggregated layer hash from the previous layer

Sep 23 '22 13:09 countvonzero

bors try

Sep 23 '22 14:09 countvonzero

try

Build failed:

ci-stage2

Sep 23 '22 14:09 bors[bot]

bors try

Sep 24 '22 00:09 countvonzero

try

Build succeeded:

ci-stage2

Sep 24 '22 01:09 bors[bot]

bors try

Sep 26 '22 09:09 countvonzero

try

Build succeeded:

ci-stage2

Sep 26 '22 10:09 bors[bot]

bors merge

Sep 26 '22 10:09 countvonzero

Pull request successfully merged into develop.

Build succeeded:

ci-stage2

Sep 26 '22 11:09 bors[bot]

go-spacemesh go-spacemesh copied to clipboard

sync: get block validity from peers

Motivation

Changes

try

try

try

try

go-spacemesh
go-spacemesh copied to clipboard