High Level

Some draft on the snapshots and full node high level specs. Joining them here since they do share some important similarities, and it might help to keep that in mind. They both work on sliding window intervals during syncing, and any table that can be pruned is a candidate for a snapshot file in the end.

Categories (prunable or moveable to snapshots)

Historical: AccountChangeSet / StorageChangeSet / AccountHistory / StorageHistory
Transactions: Transactions / TxSenders / TxLookup / TxHashNumber / TransactionBlock
Receipts

Full Node:

Default pruning. keep data related to the most 256 recent blocks.
Any table which has a BlockNumber or TxNumber as key is a candidate for pruning.
Customization on what are the most recent blocks [ allow keeping none of the block data, apart from state after verifying ]
Each category can be fully pruned (eg. no receipts) if desired so.
Prune levels should allow skipping certain stages (eg. TxSenders / TxLookup )

Snapshots:

Fixed interval. [ Every N blocks, or every N transactions ]
Any table which has a BlockNumber or TxNumber as key is a candidate for moving to static files.
Reproducible snapshots. Should include an option to sync from beginning and let the node create all its own snapshots. They should match existing ones.
Shared through a centralized host and/or p2p
Static file per category or per table
Perfect hashing table for keys
Compressed values (overall dictionary (across 0 - 17M blocks) / per file / or both [double compression])

Syncing

Full node:
- Requests block height from CL.
- Sync with pruned options. [eg. don't store/calculate changesets until the most recent blocks or receipt or TxSenders etc.].
- When approaching a block number close to block_height - 256, recheck latest block height.
- Once it reaches the desired height, register process to wake-up register (more on it below) and sleep.
Snapshots:
- ??

Tip

Register

Wake-up register [condition -> wake-up registered process (channel) -> execute]
- Loop through registered processes
- If condition is met, wake-up registered process
- Execute. [full-node: clean-up tables | snapshot: move from tables]
- Either on success or failure, update the condition with a new one.
Once a block has been handled check wake-up register conditions. [eg. static file waits for an exact block height while full node might execute every N blocks]

Priority

Full node syncing
Wake-up register
Full-node @ tip wake-up process (registration & table clean-up)
Snapshot @ tip wake-up process (registration & moving table data )
Snapshot syncing

May 10 '23 23:05 joshieDo

We need to specify in here how this would even work given the current architecture, i.e.:

What assumptions do we make in the stages, tree, and RPC about data availability? (e.g. unwinds assume changesets are available for exec/merkle)
Since we only have one write transaction, how will we periodically delete data from the tables (both for pruning and snapshots) while syncing?
How do we adjust our current database abstraction to work with static files? I assume the static files generated by "snapshots" can also be used by the node as a secondary store for historical data to keep the MDBX database small? This was not mentioned in here, but it should be accounted for

May 12 '23 01:05 onbjerg

@shekhirin will take this

May 15 '23 19:05 gakonst

Data Availability assumptions

In case of a reorg, we need to have account and storage changesets available up to the reorged block to unwind the execution, hashing and merkle stages. It applies to both pipeline and blockchain tree unwinds. We should be able to find an optimal and safe value for max reorg depth, e.g. 2 epochs = 64 blocks.
CL needs logs (and hence, receipts) available from the block where DepositContract was deployed (or the first deposit was made). It's possible for the CL to operate without these logs if it syncs from the checkpoint. Or, we could leave only deposit transaction receipts which are required for the CL's eth_getLogs to succeed.
Pruning headers and bodies by default is most likely a bad idea, because it affects the network health by being unable to fulfil devp2p peer requests GetBlockHeaders and GetBlockBodies. But having it as a configurable option might be a good idea.

Pruning

Pruning during the pipeline sync

We want to do a pruned initial sync, not requiring the user to wait until the node fully syncs first, and only then prune.
It will work by calculating and persisting the data only from the requested pruning height.

Pruning during the live sync (blockchain tree)

We want the node to continue syncing and being responsive to RPC requests.
After processing of every new block, the background pruning task (see below) will handle pruning.

Background pruning task

Listens to new canonical blocks via CanonStateNotifications.
Has a configurable minimal pruning interval, which determines how often we can prune the data across all stages.
- Setting the minimal pruning interval to 1 block (i.e. pruning after every new block) is not ideal because the disk will wear out faster.
If the minimal pruning interval condition is met, checks if any stages require pruning, and prunes the data if so.

Special case for a full node, i.e. pruning all data

For both pipeline and blockchain tree, we will need to not write the data that was requested to be fully pruned AT ALL, so we wouldn't do double work: write, and then immediately prune on the next pruning interval.

Interface

We have a DatabaseProvider trait, which currently contains the methods for inserting/appending and unwinding the data.
We will also add the pruning methods to it, and call these methods from the blockchain tree.

Jun 13 '23 13:06 shekhirin

Not persisting any changesets until the desired pruning height

I'd recommend avoiding this if we want to support forward syncing in the future, since we can only really do this because backwards sync gives us some sort of guarantee that all historical blocks should be executable with no issues. But if we forward sync, that might not be the case - we may encounter invalid blocks, and in that case we would need to unwind, which relies on changesets. So it would have to be a sliding window, i.e. always save e.g. the last 256 blocks of changesets in execution, and remove any older ones every time we commit.

If we don't want to think about this now, we should at least note it down in the implementation so we remember later

Jun 13 '23 14:06 onbjerg

This is a good point.

So it would have to be a sliding window, i.e. always save e.g. the last 256 blocks of changesets in execution, and remove any older ones every time we commit.

It should work, yeah, updated my comment.

Jun 15 '23 11:06 shekhirin

Brain dump on what can be pruned and the side effects of it:

Changesets and History can be pruned, it will affect the eth_getStorageAt, eth_getBalanceAt and tracing RPC methods
Transaction Senders can be pruned, it will affect the execution performance
Transaction can be pruned, it will affect the network health
Receipts can be pruned, it will affect the validator ability to startup without a checkpoint + eth_getLogs, eth_getFilterLogs, etc. RPC methods
Transaction Lookup Index can be pruned, it will affect the eth_getTransactionByHash, eth_getTransactionReceipt, debug_getRawTransaction RPC methods as it's used to get the transaction by hash
Plain State can't be pruned
Hashed State and Tries can't be pruned because recalculation takes a lot of time and we need it for the chain to progress

Rough minimum size estimation is:

Transactions (435GB)
Receipts (only deposit transactions for validator to execute eth_getLogs to get deposits, not sure how much it is but I'd say <5GB)
Plain State (95GB)
Hashed State (85GB)
Tries (24GB)

Total is ~650GB

Jun 21 '23 22:06 shekhirin

Very nice. @joshieDo @shekhirin:

WDYT about making the transactions table even smaller? How much do we gain if we use a minimal perfect hash function to 'compress' the keys when we put the transactions on a static file?

Avalanche also implemented a bespoke Snap sync mechanism which gossips state diffs (instead of raw state) which avoids the "healing" process of Geth's Snap sync, maybe something to learn here: https://github.com/ava-labs/avalanchego/tree/master/x/sync.

Jun 21 '23 22:06 gakonst

Please ref https://github.com/paradigmxyz/reth/issues/2753 as well for additional ideas

Jun 25 '23 13:06 onbjerg

reth
reth copied to clipboard

High-level spec for `Full Node` and `Snapshots`

High Level

Categories (prunable or moveable to snapshots)

Full Node:

Snapshots:

Syncing

Tip

Register

Priority

Data Availability assumptions

Pruning

Pruning during the pipeline sync

Pruning during the live sync (blockchain tree)

Background pruning task

Special case for a full node, i.e. pruning all data

Interface

reth reth copied to clipboard

High-level spec for `Full Node` and `Snapshots`

High Level

Categories (prunable or moveable to snapshots)

Full Node:

Snapshots:

Syncing

Tip

Register

Priority

Data Availability assumptions

Pruning

Pruning during the pipeline sync

Pruning during the live sync (blockchain tree)

Background pruning task

Special case for a full node, i.e. pruning all data

Interface

reth
reth copied to clipboard