substrate icon indicating copy to clipboard operation
substrate copied to clipboard

Extracting the syncing protocol from `sc-network`

Open tomaka opened this issue 4 years ago • 36 comments

At the moment, sc-network contains a pretty generic low-level networking stack, on top of which the syncing and transaction protocols are implemented. sc-network exposes both the API of the low-level networking stack, but also an API for syncing and transaction-related actions.

I think it would make sense to move the syncing and transaction out of sc-network.

I focused on the syncing in the issue title, because transactions are already more or less segregated from the rest.

tomaka avatar Apr 28 '21 11:04 tomaka

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 07 '21 18:07 stale[bot]

Issue still relevant and important.

tomaka avatar Jul 08 '21 07:07 tomaka

I just briefly discussed this with @tomaka. Sync is rather tightly integrated with the networking currently. This manifests mainly in the sync peer-set being explicitly used by grandpa to keep both in sync. For now this should stay this way, this means we keep a hard coded sync peer-set inside the networking crate. For now, we would only move out the sync related code into its own crate.

bkchr avatar Nov 15 '21 11:11 bkchr

Code can mainly be found here: https://github.com/paritytech/substrate/tree/master/client/network/src/protocol/sync

bkchr avatar Nov 15 '21 11:11 bkchr

Okay, I will try to extract anything related to client/network/src/schema/api.v1.proto into a new crate sc-network-sync under client/network-sync

FYI, it is mainly this https://github.com/paritytech/substrate/blob/master/client/network/src/protocol/sync.rs and this https://github.com/paritytech/substrate/tree/master/client/network/src/protocol/sync.

And while I see that network-gossip is on the client folder level, sync should go to client/network/sync.

bkchr avatar Dec 06 '21 11:12 bkchr

What is complicated is that the code in protocol.rs works as follow:

  • The low-level code reports that a substream has been opened.
  • We synchronously check whether this peer's handshake is ok.
  • If no, we ask the low-level code to close the substream and pretend that this never happened.
  • If yes, we generate a SyncConnected event in the public high-level API.

If you want to plug the sync code through the public high-level API, you need to refactor this, but I don't really know how.

tomaka avatar Dec 06 '21 12:12 tomaka

Turns out I need this for changing sync mechanics. Can anyone share priority/ETA on this so I can manage expectations better?

nazar-pc avatar Feb 18 '22 21:02 nazar-pc

I am learning a lot about the Substrate libp2p integration as I slowly crawl with this PR. I wanted to do too much refactoring at first, but got some intermediate milestone I can reach in a week next to my other ongoing tasks, I guess.

@wigy-opensource-developer can you please give details on what is planned.

bkchr avatar Feb 19 '22 11:02 bkchr

Of course.

As a first step sc-network will depend on an extracted crate sc-network-sync that will contain all handling of the block and state synchronization messages defined in the api.v1 Protobuf protocol.

As a second step message handling for light.v1 will go into an sc-network-light crate, but it will be still sc-network that depends on both.

As a third step, abstractions common to sc-network-sync and sc-network-light will be introduced into sc-network and the direction of dependency will be reversed.

My estimation is that the first step will get to master next week.

Hey, any progress here?

nazar-pc avatar Mar 29 '22 22:03 nazar-pc

Hey, no there is not yet any real progress. This will also still take quite some time to have something proper.

bkchr avatar Mar 31 '22 05:03 bkchr

Is there something I can do to speed up the process here? I can do some refactoring if I understand what it should look like in the end.

nazar-pc avatar Mar 31 '22 05:03 nazar-pc

Can you again say what you need @nazar-pc?

bkchr avatar Mar 31 '22 06:03 bkchr

To replace sync mechanism with a custom one. Let's say node discovers it is significantly behind the tip of the chain and needs to reach out to archival node to get last 1000 blocks. I need that to happen over our distributed storage network instead, archival nodes in traditional sense will not be feasible on our network due to scale.

nazar-pc avatar Mar 31 '22 18:03 nazar-pc

Do you have any write up of your syncing algorithm? I mean it is enough to have some rough write up on how it works.

bkchr avatar Mar 31 '22 21:03 bkchr

There is a lot to it, but somewhat abstractly the algorithm would be like this:

  • for older archival history:
    • identify the tip of the chain and root of the archived history (used to verify individual pieces)
    • start pulling segments of archived history corresponding to desired blocks in parallel from different peers (each segment is 128 pieces erasure coded into 256)
    • verify witness embedded in each of those pieces against root of the history
    • repair source data in case parity pieces were used
    • feed source pieces into reconstructor that will spit out encoded blocks as vectors of bytes alongside with corresponding block numbers (reverse of archiving process encoded blocks are fed into archiver and pieces are returned back)
    • at this point we have blocks we needed and can verify and import them
  • most recent blocks that were not archived yet (archiving is done over blocks at certain confirmation depth) should be synced in normal Substrate way from other full nodes

This replaces how node requests ranges of blocks of archival history from other nodes, I expect that majority of the remaining logic will remain the same as it is in Substrate though.

nazar-pc avatar Mar 31 '22 22:03 nazar-pc

Okay ty.

bkchr avatar Apr 03 '22 19:04 bkchr

I need to expand that syncing Subspace from Subspace DSN using described approach is the first step. We'd like to enable other chains to be able to sync from Subspace DSN in a similar way, so ideally we'd write some kind module that others can import into their project if they so choose. We were discussing about enabling parachains on Rococo to sync this way for instance.

nazar-pc avatar Apr 13 '22 16:04 nazar-pc

As a first step sc-network will depend on an extracted crate sc-network-sync that will contain all handling of the block and state synchronization messages defined in the api.v1 Protobuf protocol.

@nazar-pc if you like you could start working on this. This is just the first iteration and is rather simple. It is really just about moving the code related to syncing to a new crate. sc-network for now will depend on this crate.

bkchr avatar Apr 20 '22 16:04 bkchr

Cool, I'll notify once I start actively working on it

nazar-pc avatar Apr 20 '22 16:04 nazar-pc

I started working on this, will send a PR once I have something useful.

Progress:

  • https://github.com/paritytech/substrate/pull/11303
  • https://github.com/paritytech/substrate/pull/11322
  • https://github.com/paritytech/substrate/pull/11347
  • https://github.com/paritytech/substrate/pull/11412
  • https://github.com/paritytech/substrate/pull/11825
  • https://github.com/paritytech/substrate/pull/11940
  • https://github.com/paritytech/substrate/pull/12006

nazar-pc avatar Apr 26 '22 04:04 nazar-pc

I believe I have this covered (links in the comment above):

As a first step sc-network will depend on an extracted crate sc-network-sync that will contain all handling of the block and state synchronization messages defined in the api.v1 Protobuf protocol.

As a second step message handling for light.v1 will go into an sc-network-light crate, but it will be still sc-network that depends on both.

There are 24 mentions of sc_network_sync in sc_network in my branch and 4 mentions of sc_network_light.

Waiting for review and open to suggestions on next steps.

nazar-pc avatar Apr 29 '22 01:04 nazar-pc

Part 4 is ready on my side, it makes sc-network, sc-network-light and sc-network-sync depend on sc-network-common, but not on each other. Everything is wired together at sc-service level and sc-network-sync/sc-network-light theoretically can be swapped/customized: https://github.com/paritytech/substrate/pull/11412 Not 100% sure yet, but it might be just enough for me to do some of the tweaks to the sync process I wanted to do for a while.

Will submit as a PR once third PR is merged.

nazar-pc avatar May 12 '22 03:05 nazar-pc

Not 100% sure yet, but it might be just enough for me to do some of the tweaks to the sync process I wanted to do for a while.

Sounds good :)

bkchr avatar May 14 '22 07:05 bkchr

Hi! @nazar-pc do you plan more PR after the Network sync refactoring (part 4) paritytech/substrate#11412 is merged? All four PRs are a great work and are a major step towards untangling sync from the network. During our (@timorl and myself) investigation of the network, however, we spotted that there are still some parts of sync in the sc-network. Some examples from client/network/src/protocol.rs, from where many methods are forwarded to either ChainSync or behaviour/Notifications: new_best_block_imported, on_sync_peer_disconnected, on_block_response, on_state_response, on_warp_sync_response, on_sync_peer_connected, announce_block, push_block_announce_validation. The list isn't complete, but collects the great part of functions that can't be untangled with sync /network only by basic level refactor. The question is if those are elements you are planning to change in the nearest future? You’ve stated that part 4 probably suffices for your tweaks so the question is for @bkchr and @tomaka if you are open for more PRs that move those remaining parts out and make the network more generic in terms of compatibility with exchangeable synchronization mechanism?

Yithis avatar May 25 '22 15:05 Yithis

Yes, I do expect more PRs after that, just need feedback and to sync with maintainers before moving forward.

I'd like to not only sync to be decoupled, but eventually to have networking interface where sc-network would be just one possible implementation with ability to wrap/swap it with a custom one. So there would be a set of traits where some of the methods you mentioned would go for instance.

Just trying to make small steps without breaking things in the process.

UPD: Part 4 covers some of the needs I had, but likely not all of them, so expect more PRs for sure.

nazar-pc avatar May 25 '22 17:05 nazar-pc

With latest PR still not merged I already started working on more changes. First of all addressing some of review comments from that PR that'll land separately, but also on refactorings around NetworkService.

On NetworkService specifically I noticed that there were traits here and there that were proxying calls into NetworkService, so I decided to move them into sc-network-common as well as extracting more traits for different use cases so that most of Substrate would depend on generic traits from sc-network-common instead of NetworkService explicitly.

It'll allow wrapping NetworkService or even wrapping default Substrate networking layer with an alternative just like you can swap consesnsus and other parts. It'll take time to make those interfaces truly generic (making it independent from libp2p is out of scope though IMO, so things like PeerId will remain in many places).

nazar-pc avatar Jul 12 '22 16:07 nazar-pc

@nazar-pc what is your future plans here? Do you still want to continue pursuing this issue or can I assign it to someone else?

bkchr avatar Aug 22 '22 11:08 bkchr