substrate
substrate copied to clipboard
Extracting the syncing protocol from `sc-network`
At the moment, sc-network contains a pretty generic low-level networking stack, on top of which the syncing and transaction protocols are implemented.
sc-network exposes both the API of the low-level networking stack, but also an API for syncing and transaction-related actions.
I think it would make sense to move the syncing and transaction out of sc-network.
I focused on the syncing in the issue title, because transactions are already more or less segregated from the rest.
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.
Issue still relevant and important.
I just briefly discussed this with @tomaka. Sync is rather tightly integrated with the networking currently. This manifests mainly in the sync peer-set being explicitly used by grandpa to keep both in sync. For now this should stay this way, this means we keep a hard coded sync peer-set inside the networking crate. For now, we would only move out the sync related code into its own crate.
Code can mainly be found here: https://github.com/paritytech/substrate/tree/master/client/network/src/protocol/sync
Okay, I will try to extract anything related to client/network/src/schema/api.v1.proto into a new crate sc-network-sync under client/network-sync
FYI, it is mainly this https://github.com/paritytech/substrate/blob/master/client/network/src/protocol/sync.rs and this https://github.com/paritytech/substrate/tree/master/client/network/src/protocol/sync.
And while I see that network-gossip is on the client folder level, sync should go to client/network/sync.
What is complicated is that the code in protocol.rs works as follow:
- The low-level code reports that a substream has been opened.
- We synchronously check whether this peer's handshake is ok.
- If no, we ask the low-level code to close the substream and pretend that this never happened.
- If yes, we generate a
SyncConnectedevent in the public high-level API.
If you want to plug the sync code through the public high-level API, you need to refactor this, but I don't really know how.
Turns out I need this for changing sync mechanics. Can anyone share priority/ETA on this so I can manage expectations better?
I am learning a lot about the Substrate libp2p integration as I slowly crawl with this PR. I wanted to do too much refactoring at first, but got some intermediate milestone I can reach in a week next to my other ongoing tasks, I guess.
@wigy-opensource-developer can you please give details on what is planned.
Of course.
As a first step sc-network will depend on an extracted crate sc-network-sync that will contain all handling of the block and state synchronization messages defined in the api.v1 Protobuf protocol.
As a second step message handling for light.v1 will go into an sc-network-light crate, but it will be still sc-network that depends on both.
As a third step, abstractions common to sc-network-sync and sc-network-light will be introduced into sc-network and the direction of dependency will be reversed.
My estimation is that the first step will get to master next week.
Hey, any progress here?
Hey, no there is not yet any real progress. This will also still take quite some time to have something proper.
Is there something I can do to speed up the process here? I can do some refactoring if I understand what it should look like in the end.
Can you again say what you need @nazar-pc?
To replace sync mechanism with a custom one. Let's say node discovers it is significantly behind the tip of the chain and needs to reach out to archival node to get last 1000 blocks. I need that to happen over our distributed storage network instead, archival nodes in traditional sense will not be feasible on our network due to scale.
Do you have any write up of your syncing algorithm? I mean it is enough to have some rough write up on how it works.
There is a lot to it, but somewhat abstractly the algorithm would be like this:
- for older archival history:
- identify the tip of the chain and root of the archived history (used to verify individual pieces)
- start pulling segments of archived history corresponding to desired blocks in parallel from different peers (each segment is 128 pieces erasure coded into 256)
- verify witness embedded in each of those pieces against root of the history
- repair source data in case parity pieces were used
- feed source pieces into reconstructor that will spit out encoded blocks as vectors of bytes alongside with corresponding block numbers (reverse of archiving process encoded blocks are fed into archiver and pieces are returned back)
- at this point we have blocks we needed and can verify and import them
- most recent blocks that were not archived yet (archiving is done over blocks at certain confirmation depth) should be synced in normal Substrate way from other full nodes
This replaces how node requests ranges of blocks of archival history from other nodes, I expect that majority of the remaining logic will remain the same as it is in Substrate though.
Okay ty.
I need to expand that syncing Subspace from Subspace DSN using described approach is the first step. We'd like to enable other chains to be able to sync from Subspace DSN in a similar way, so ideally we'd write some kind module that others can import into their project if they so choose. We were discussing about enabling parachains on Rococo to sync this way for instance.
As a first step
sc-networkwill depend on an extracted cratesc-network-syncthat will contain all handling of the block and state synchronization messages defined in theapi.v1Protobuf protocol.
@nazar-pc if you like you could start working on this. This is just the first iteration and is rather simple. It is really just about moving the code related to syncing to a new crate. sc-network for now will depend on this crate.
Cool, I'll notify once I start actively working on it
I started working on this, will send a PR once I have something useful.
Progress:
- https://github.com/paritytech/substrate/pull/11303
- https://github.com/paritytech/substrate/pull/11322
- https://github.com/paritytech/substrate/pull/11347
- https://github.com/paritytech/substrate/pull/11412
- https://github.com/paritytech/substrate/pull/11825
- https://github.com/paritytech/substrate/pull/11940
- https://github.com/paritytech/substrate/pull/12006
I believe I have this covered (links in the comment above):
As a first step sc-network will depend on an extracted crate sc-network-sync that will contain all handling of the block and state synchronization messages defined in the api.v1 Protobuf protocol.
As a second step message handling for light.v1 will go into an sc-network-light crate, but it will be still sc-network that depends on both.
There are 24 mentions of sc_network_sync in sc_network in my branch and 4 mentions of sc_network_light.
Waiting for review and open to suggestions on next steps.
Part 4 is ready on my side, it makes sc-network, sc-network-light and sc-network-sync depend on sc-network-common, but not on each other. Everything is wired together at sc-service level and sc-network-sync/sc-network-light theoretically can be swapped/customized: https://github.com/paritytech/substrate/pull/11412
Not 100% sure yet, but it might be just enough for me to do some of the tweaks to the sync process I wanted to do for a while.
Will submit as a PR once third PR is merged.
Not 100% sure yet, but it might be just enough for me to do some of the tweaks to the sync process I wanted to do for a while.
Sounds good :)
Hi! @nazar-pc do you plan more PR after the Network sync refactoring (part 4) paritytech/substrate#11412 is merged? All four PRs are a great work and are a major step towards untangling sync from the network. During our (@timorl and myself) investigation of the network, however, we spotted that there are still some parts of sync in the sc-network. Some examples from client/network/src/protocol.rs, from where many methods are forwarded to either ChainSync or behaviour/Notifications:
new_best_block_imported,
on_sync_peer_disconnected,
on_block_response,
on_state_response,
on_warp_sync_response,
on_sync_peer_connected,
announce_block,
push_block_announce_validation.
The list isn't complete, but collects the great part of functions that can't be untangled with sync /network only by basic level refactor. The question is if those are elements you are planning to change in the nearest future? You’ve stated that part 4 probably suffices for your tweaks so the question is for @bkchr and @tomaka if you are open for more PRs that move those remaining parts out and make the network more generic in terms of compatibility with exchangeable synchronization mechanism?
Yes, I do expect more PRs after that, just need feedback and to sync with maintainers before moving forward.
I'd like to not only sync to be decoupled, but eventually to have networking interface where sc-network would be just one possible implementation with ability to wrap/swap it with a custom one. So there would be a set of traits where some of the methods you mentioned would go for instance.
Just trying to make small steps without breaking things in the process.
UPD: Part 4 covers some of the needs I had, but likely not all of them, so expect more PRs for sure.
With latest PR still not merged I already started working on more changes. First of all addressing some of review comments from that PR that'll land separately, but also on refactorings around NetworkService.
On NetworkService specifically I noticed that there were traits here and there that were proxying calls into NetworkService, so I decided to move them into sc-network-common as well as extracting more traits for different use cases so that most of Substrate would depend on generic traits from sc-network-common instead of NetworkService explicitly.
It'll allow wrapping NetworkService or even wrapping default Substrate networking layer with an alternative just like you can swap consesnsus and other parts. It'll take time to make those interfaces truly generic (making it independent from libp2p is out of scope though IMO, so things like PeerId will remain in many places).
@nazar-pc what is your future plans here? Do you still want to continue pursuing this issue or can I assign it to someone else?