nimbus-eth2 icon indicating copy to clipboard operation
nimbus-eth2 copied to clipboard

Hot/cold block storage

Open arnetheduck opened this issue 4 years ago • 15 comments

We're currently using a key-value store for storing states and blocks. Due to the nature of eth2, when finalization happens a single history of blocks is chosen to be canonical, thus it would be efficient to store the block database in a cold storage that is a flat append-only file.

There are a few ways to design this - an example is keeping two files: one for blocks (which are variable-size) and an index file which contains fixed-size offsets - this would allow random-access to blocks by their block number.

It also probably makes sense to store block hashes - these can either go in the block file or a third file containing only hashes.

Another design is to keep offsets in the ordinary key-value store (for example with slot number as key, offset and hash as value) so that in total we have the kvstore and one cold-store file to deal with.

Finally, the block graph is currently stored in-memory - possibly, this could sit in the database as well, saving memory but increasing database traffic - the tradeoff is not clear here as the block graph is "fairly" light-weight.

arnetheduck avatar Feb 19 '20 09:02 arnetheduck

The API of the cold database should allow us to efficiently memory-map the SSZ representation of a particular block without loading it in memory. We can use SszNavigator objects to extract any data of interest:

https://github.com/status-im/nim-beacon-chain/blob/8ab0248209aba82cfdf6e64dacf2e21753a5a55a/tests/test_ssz.nim#L85-L89

Since Nim cannot safely return openArrays yet, the best way to design the API is rely on callback closures that will receive the memory-mapped data as an argument:

https://github.com/status-im/nim-beacon-chain/blob/2a67ac3c05859af682994facc36e646a3febc24a/tests/test_kvstore.nim#L32-L34

The description above by @arnetheduck focuses on our needs for storing the history of BeaconBlocks. Please note that we'll also need to store the latest finalized state and potentially periodic snapshots of earlier states. It may be premature to propose designs for this as we're planning to introduce some level of data sharing between different beacon states that may be also used in the on-disk representation.

zah avatar Feb 25 '20 12:02 zah

Here's a simple nimterop wrapper for lmdb. It's pretty great. Golden isn't a super example of its use, honestly. Maybe I'll finish it someday.

  • http://www.lmdb.tech/doc/index.html
  • https://github.com/disruptek/golden

Anyway, if you like this API, you can use it to close this issue.

import os

import nimterop/[build, cimport]

const
  baseDir = getProjectCacheDir("nimlmdb")

static:
  #cDebug()

  gitPull(
    "https://github.com/LMDB/lmdb",
    outdir = baseDir,
    checkout = "mdb.master"
  )

getHeader(
  "lmdb.h",
  outdir = baseDir / "libraries" / "liblmdb"
)

type
  mode_t = uint32

when defined(lmdbStatic):
  cImport(lmdbPath)
else:
  cImport(lmdbPath, dynlib = "lmdbLPath")

disruptek avatar Feb 29 '20 20:02 disruptek

see also https://github.com/status-im/nim-beacon-chain/blob/devel/beacon_chain/kvstore_lmdb.nim - we've tried lmdb but it has issues on 32-bit platforms and needs local patching on windows - it's not great for our use case.

we use sqlite for now which also uses mmap if available but something else otherwise.

the point here is though that we don't want a database at all - the nature of the data is such that it's append-only - it allows for a very robust and trivially simple implementation with a flat file and an accompanying flat index - the lmdb btree would be overkill.

arnetheduck avatar Mar 01 '20 09:03 arnetheduck

re nimterop, we have a preference not to have it as a dependency for whoever is building the code - see https://github.com/arnetheduck/nim-sqlite3-abi (we've produced wrappers manually as well as with c2nim, for this reason)

arnetheduck avatar Mar 01 '20 12:03 arnetheduck

It sounds like the best course of action is to let @protolambda tell us when the design is fairly stable and then use it to inform the hot/cold storage approach. It sounds like there may be two layers required; one which is append-only and never requires compaction, and another that is append-only and rarely requires compaction.

But I'm really trying to read between the lines here on something I know nothing at all about. :wink:

disruptek avatar Mar 05 '20 00:03 disruptek

This part of the design is stable: the way ethereum 2 works is that once finalization happens, there is not ever any rollback - the blocks that are older than the finalization point form a simple linear history, thus are append-only.

The blocks that are newer than finalization will be accessed randomly by hash - this is why they should be stored in an "ordinary" key/value store to begin with - even if it's likely that they are "almost-linear", we shouldn't make that assumption right now as it may open up for potential for DoS attacks, if accessing random non-finalized blocks is not constant time.

for some intuition as to what kind of requests will be made from the database, the networking spec is a good source: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyrange https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyroot

the finalized blocks are accessed pretty much by their slot number while non-finalized blocks are accessed randomly - the databases in use should reflect this, storing the former in an append-only and the latter in.. well, they can stay in the KV store for now - there's an upper bound of about two weeks worth of blocks for how many there can be in the system.

arnetheduck avatar Mar 05 '20 09:03 arnetheduck

Hello! I'm interested in this one! Is there still time for taking it? Thanks!

JGcarv avatar Mar 13 '20 15:03 JGcarv

It's all yours, @JGcarv. We'll be happy to fund 2 days of work for creating a very basic initial implementation with an accompanying test suite. After reviewing the initial results, we will reassess the goals and suggest further directions.

zah avatar Mar 13 '20 18:03 zah

Awesome. Thank you!

JGcarv avatar Mar 13 '20 19:03 JGcarv

This topic has evolved a little since we last looked at it:

  • https://github.com/status-im/nimbus-eth2/pull/2382 provides a flat storage format that combines a state with the blocks that lead up to it - the interesting part here is that the file is self-contained, trivially verifiable and has all the roots and keys needed to fully validate the data - starting with an era file for the genesis state, we can produce a new era file every 8192 blocks (once per day more or less)
  • Because the flat file format is verifiable, it's also suitable for wider distribution, such as when dealing with weak subjectivity sync
  • Between head and the latest era, we can use https://github.com/status-im/nimbus-eth2/blob/stable/beacon_chain/statediff.nim and https://github.com/status-im/nimbus-eth2/pull/2297 to efficiently store states and diffs - these two features taken together mean we'll have a good balance between small footprint and simplicity of use, specially if the era files are indexed.

A downside of this approach is that we lose "here's an sqlite database with everything" world - but that's already the case somewhat with the slashing protection, validator keys and secrets being separate.

arnetheduck avatar Mar 11 '21 08:03 arnetheduck

this should be closed

TennisBowling avatar Feb 15 '22 16:02 TennisBowling

Why? Nimbus still doesn't really have the hot/cold storage distinction this issue proposes. Era files have been gradually developing (https://github.com/status-im/nimbus-eth2/pull/3394 develops them a bit further, for example), but they're not yet functionally exposed to end-users except via ncli_db.

tersec avatar Feb 15 '22 17:02 tersec

It seems that hot/cold storage wasn't being went after anymore

since this PR was created, we've pivoted towards using era files and state diffs as a future direction for hot/cold - closing as obsolete

#835

TennisBowling avatar Feb 16 '22 04:02 TennisBowling

Yes, "using era files and state diffs as a future direction for hot/cold". That particular PR was closed as obsolete, but hot/cold block storage remains a goal, and as the sentence you quote suggests, era files and state diffs, neither of which is really end-user-visible yet modulo ncli/ncli_db, are the current approach to achieving that. This issue tracks hot/cold block storage overall, not just as a proxy for that one PR.

tersec avatar Feb 16 '22 06:02 tersec

ah I see. thank you

TennisBowling avatar Feb 16 '22 17:02 TennisBowling

The era store provides hot/cold storage functionality - further work in this area will be tracked separately: https://nimbus.guide/era-store.html

arnetheduck avatar Oct 27 '22 08:10 arnetheduck