go-spacemesh
go-spacemesh copied to clipboard
sync: get block validity from peers
Motivation
Closes #3003
Changes
- syncer: move state sync code to a separate file
- moves opinion processing from fetcher to syncer pkg
- add the following data to opinions response
- epoch weight
- latest verified layer
- aggregated hash
- block validity
opinions amongst peers are sorted in the following order
- highest epoch weight
- highest verified layer
- existence of certificate
block validity is saved directly to database for tortoise to consume.
bors try
validity is imported from the same peer or each layer from different? it won't work in the second case
are certificates downloaded for all layers? not only hdist?
validity is imported from the same peer or each layer from different? it won't work in the second case
as implemented it can be from different peer for each layer. it will be ideal if we can import it from the same peer. but A. we don't have a way to retain peer now. https://github.com/spacemeshos/go-spacemesh/issues/3551 B. even if we can retain peer, there is still cases where the peer can drop from the network anytime. for B, my plan is to switch to a different peer if the mesh hash is consistent.
in general, if the node is syncing from peer X and Y, both have the same accumulated hash in the mesh. what is the issue you see that to import validity from X for layer N and from Y for layer N+1?
are certificates downloaded for all layers? not only hdist?
yes. for all layers for now. https://github.com/spacemeshos/go-spacemesh/blob/e55b88add994aea08ecdc9271212f9e8b7fa35ba/syncer/state_syncer.go#L223 there is an attempt to do this, but mesh.ProcessLayer()/trtl's last layer usually is the last synced layer. this is the reason why we cannot prune the certificate yet in the network, as discussed in slack https://spacemesh.slack.com/archives/C89QJJY3G/p1661405377244839?thread_ts=1661281388.797769&cid=C89QJJY3G
we need to fix https://github.com/spacemeshos/go-spacemesh/issues/2921 before we can start pruning certificate and that a new node only need certificate at the last hdist layers.
we don't have a way to retain peer now. https://github.com/spacemeshos/go-spacemesh/issues/3551
i don't quite understand why there is a dependency on that. it is completely up to sync which peers to use. ofcourse they may go away unexpectedly, as you noted
in general, if the node is syncing from peer X and Y, both have the same accumulated hash in the mesh.
if it downloads opinion from the peer that atleast communicates same hash it should work. but otherwise, even if one opinion somewhere in the middle is inconsistent, verifying tortoise will be stuck
we don't have a way to retain peer now. #3551
i don't quite understand why there is a dependency on that. it is completely up to sync which peers to use.
so i don't fully understand how p2p module manage peers and probably assume incorrectly.
let's say the node finds peer X has a different hash at layer N, it wants to find at which layer it starts having differences. i assume at this point we want to make best effort to retain peer X until the node gets all relevant data from X.
last time we discussed this in the dev sync, your suggestion was to use gossipsub scoring system to keep this peer around. but since fetcher gets peers from p2p/bootstrap, which uses EventSpacemeshPeer to add/rm from the peer list, i am not sure gossipsub's scoring system can affect that.
in general, if the node is syncing from peer X and Y, both have the same accumulated hash in the mesh.
if it downloads opinion from the peer that atleast communicates same hash it should work. but otherwise, even if one opinion somewhere in the middle is inconsistent, verifying tortoise will be stuck
thanks. i'll make that change in this PR and notify you when its ready
i don't quite understand why there is a dependency on that. it is completely up to sync which peers to use.
@dshulyak i didn't explain this well. yes syncer can pick any peer to use. but if we don't have an active connection with the peer, it'll fail here https://github.com/spacemeshos/go-spacemesh/blob/09225a3ada8939ccbe9c3c6cf9f18af91d9f2367/p2p/server/server.go#L141
and i don't want to just dial a peer randomly in this case. i'd rather like an API from the p2p package to do that consistently with its peer management scheme.
and i don't want to just dial a peer randomly in this case. i'd rather like an API from the p2p package to do that consistently with its peer management scheme.
rpc server won't dial peer if there is no connection. it has this https://github.com/spacemeshos/go-spacemesh/blob/09225a3ada8939ccbe9c3c6cf9f18af91d9f2367/p2p/server/server.go#L154 . yes it will fail if selected peer fails to respond. but it should be expected that peer can go away. node has no control over it
the only thing that be provided by p2p module is to prioritize connections with "better" peers. but it is only relevant if node will be over high watermark for number of peers (currently 100), because at that time it will prune total number of peers down to lower watermark (which is set to 40)
if it downloads opinion from the peer that atleast communicates same hash it should work. but otherwise, even if one opinion somewhere in the middle is inconsistent, verifying tortoise will be stuck
@dshulyak added a commit that only adopts block validity from peers with the same aggregated layer hash from the previous layer
bors try
bors try
bors try
bors merge