nimbus-eth2
nimbus-eth2 copied to clipboard
Hot/cold block storage
We're currently using a key-value store for storing states and blocks. Due to the nature of eth2, when finalization happens a single history of blocks is chosen to be canonical, thus it would be efficient to store the block database in a cold storage that is a flat append-only file.
There are a few ways to design this - an example is keeping two files: one for blocks (which are variable-size) and an index file which contains fixed-size offsets - this would allow random-access to blocks by their block number.
It also probably makes sense to store block hashes - these can either go in the block file or a third file containing only hashes.
Another design is to keep offsets in the ordinary key-value store (for example with slot number as key, offset and hash as value) so that in total we have the kvstore and one cold-store file to deal with.
Finally, the block graph is currently stored in-memory - possibly, this could sit in the database as well, saving memory but increasing database traffic - the tradeoff is not clear here as the block graph is "fairly" light-weight.
The API of the cold database should allow us to efficiently memory-map the SSZ representation of a particular block without loading it in memory. We can use SszNavigator objects to extract any data of interest:
https://github.com/status-im/nim-beacon-chain/blob/8ab0248209aba82cfdf6e64dacf2e21753a5a55a/tests/test_ssz.nim#L85-L89
Since Nim cannot safely return openArrays
yet, the best way to design the API is rely on callback closures that will receive the memory-mapped data as an argument:
https://github.com/status-im/nim-beacon-chain/blob/2a67ac3c05859af682994facc36e646a3febc24a/tests/test_kvstore.nim#L32-L34
The description above by @arnetheduck focuses on our needs for storing the history of BeaconBlocks
. Please note that we'll also need to store the latest finalized state and potentially periodic snapshots of earlier states. It may be premature to propose designs for this as we're planning to introduce some level of data sharing between different beacon states that may be also used in the on-disk representation.
Here's a simple nimterop wrapper for lmdb. It's pretty great. Golden isn't a super example of its use, honestly. Maybe I'll finish it someday.
- http://www.lmdb.tech/doc/index.html
- https://github.com/disruptek/golden
Anyway, if you like this API, you can use it to close this issue.
import os
import nimterop/[build, cimport]
const
baseDir = getProjectCacheDir("nimlmdb")
static:
#cDebug()
gitPull(
"https://github.com/LMDB/lmdb",
outdir = baseDir,
checkout = "mdb.master"
)
getHeader(
"lmdb.h",
outdir = baseDir / "libraries" / "liblmdb"
)
type
mode_t = uint32
when defined(lmdbStatic):
cImport(lmdbPath)
else:
cImport(lmdbPath, dynlib = "lmdbLPath")
see also https://github.com/status-im/nim-beacon-chain/blob/devel/beacon_chain/kvstore_lmdb.nim - we've tried lmdb but it has issues on 32-bit platforms and needs local patching on windows - it's not great for our use case.
we use sqlite for now which also uses mmap if available but something else otherwise.
the point here is though that we don't want a database at all - the nature of the data is such that it's append-only - it allows for a very robust and trivially simple implementation with a flat file and an accompanying flat index - the lmdb btree would be overkill.
re nimterop, we have a preference not to have it as a dependency for whoever is building the code - see https://github.com/arnetheduck/nim-sqlite3-abi (we've produced wrappers manually as well as with c2nim, for this reason)
It sounds like the best course of action is to let @protolambda tell us when the design is fairly stable and then use it to inform the hot/cold storage approach. It sounds like there may be two layers required; one which is append-only and never requires compaction, and another that is append-only and rarely requires compaction.
But I'm really trying to read between the lines here on something I know nothing at all about. :wink:
This part of the design is stable: the way ethereum 2 works is that once finalization happens, there is not ever any rollback - the blocks that are older than the finalization point form a simple linear history, thus are append-only.
The blocks that are newer than finalization will be accessed randomly by hash - this is why they should be stored in an "ordinary" key/value store to begin with - even if it's likely that they are "almost-linear", we shouldn't make that assumption right now as it may open up for potential for DoS attacks, if accessing random non-finalized blocks is not constant time.
for some intuition as to what kind of requests will be made from the database, the networking spec is a good source: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyrange https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyroot
the finalized blocks are accessed pretty much by their slot number while non-finalized blocks are accessed randomly - the databases in use should reflect this, storing the former in an append-only and the latter in.. well, they can stay in the KV store for now - there's an upper bound of about two weeks worth of blocks for how many there can be in the system.
Hello! I'm interested in this one! Is there still time for taking it? Thanks!
It's all yours, @JGcarv. We'll be happy to fund 2 days of work for creating a very basic initial implementation with an accompanying test suite. After reviewing the initial results, we will reassess the goals and suggest further directions.
Awesome. Thank you!
This topic has evolved a little since we last looked at it:
- https://github.com/status-im/nimbus-eth2/pull/2382 provides a flat storage format that combines a state with the blocks that lead up to it - the interesting part here is that the file is self-contained, trivially verifiable and has all the roots and keys needed to fully validate the data - starting with an era file for the genesis state, we can produce a new era file every 8192 blocks (once per day more or less)
- Because the flat file format is verifiable, it's also suitable for wider distribution, such as when dealing with weak subjectivity sync
- Between
head
and the latest era, we can use https://github.com/status-im/nimbus-eth2/blob/stable/beacon_chain/statediff.nim and https://github.com/status-im/nimbus-eth2/pull/2297 to efficiently store states and diffs - these two features taken together mean we'll have a good balance between small footprint and simplicity of use, specially if the era files are indexed.
A downside of this approach is that we lose "here's an sqlite database with everything" world - but that's already the case somewhat with the slashing protection, validator keys and secrets being separate.
this should be closed
Why? Nimbus still doesn't really have the hot/cold storage distinction this issue proposes. Era files have been gradually developing (https://github.com/status-im/nimbus-eth2/pull/3394 develops them a bit further, for example), but they're not yet functionally exposed to end-users except via ncli_db
.
It seems that hot/cold storage wasn't being went after anymore
since this PR was created, we've pivoted towards using era files and state diffs as a future direction for hot/cold - closing as obsolete
#835
Yes, "using era files and state diffs as a future direction for hot/cold". That particular PR was closed as obsolete, but hot/cold block storage remains a goal, and as the sentence you quote suggests, era files and state diffs, neither of which is really end-user-visible yet modulo ncli
/ncli_db
, are the current approach to achieving that. This issue tracks hot/cold block storage overall, not just as a proxy for that one PR.
ah I see. thank you
The era store provides hot/cold storage functionality - further work in this area will be tracked separately: https://nimbus.guide/era-store.html