oasis-core icon indicating copy to clipboard operation
oasis-core copied to clipboard

Refactor runtime storage committee worker into smaller and independent workers

Open martintomazic opened this issue 4 months ago • 1 comments

~We should also move out things that this worker should not be responsible for such as updating availability, checkpointing and pruning the state.~

Depending on the config storage committee worker should only orchestrate:

  1. Diff sync worker (fetches and applies storage diffs).
  2. Light sync worker (populates light history - either uses indexer, or eventually p2p or both).
  3. Checkpointer (responsible for creating storage checkpoints)
  4. "Availability nudger" (old name) responsible for registering availability.
  5. Checkpoint sync worker - responsible for runtime state sync:
    • Currently we are leaking resources, as checkpoints sync p2p clients persists idly (peer tracking etc).

martintomazic avatar Aug 25 '25 21:08 martintomazic

I believe we should resume work on this as it would enable us to implement #6356, #6241 and finally #6400.

Motivation

First two issues should produce around 2x speed-up alone for the state sync. E.g. to sync consensus with paratime from the genesis it currently takes at least two weeks :/.

Last issue is about simplifying our abstractions and making future extensions easier to implement.

Additional requirement

Benchmarking of the state sync should be reproducible, i.e. it should be independent of the p2p layer. This will also enable us to mock and test some of the unhappy paths of this worker.

Proposed next step

With #6306 and #6411 merged I suggest to start working on:

  1. Diff sync worker (fetches and applies storage diffs).

Diff sync worker should define an interface:

type Diff struct {
	round    uint64
	prevRoot storageApi.Root
	thisRoot storageApi.Root
	writeLog storageApi.WriteLog
}

type Fetcher interface {
	Next(ctx context.Context) (Diff, error)
	Accept()
	Reject()
}

and we should have various implementations:

// P2PFetcher initializes p2p clients and ensures up to configured number of prefetched storage diffs.
type P2PFetcher struct {
    history history.History
	// other fields
}

// Filefetcher fetches storage diffs from a pre-populated binary file.
// Useful for benchmarking.
type FileFetcher struct {
    path string
	// fields
}

Concretely P2PFetcher should encapsulate all abstractions, types and logic relevant for fetching and caching of the storage diffs found inside committee worker. Something similar was already proposed by @peternose, see diffFetcher. Here due to the reason above, I would propose to hide impl behind interface and namespace it under a nested package.

Diff sync worker is then a trivial worker with apply and finalize goroutines, that communicate via a buffered channel, creating a back-pressure to not exceed dbAPI.MaxPendingVersions. Existing committee worker is then only about orchestrating 1-5 workers.

@kostko should I allocate 25% of my time into this? Maybe I good start would be to run one reference sync, export metrics and report in the issue so that we have a reference somewhere. Worst case this could serve as a base for https://github.com/oasisprotocol/docs/issues/1506.

martintomazic avatar Nov 29 '25 17:11 martintomazic