nimbus-eth1
nimbus-eth1 copied to clipboard
ideas/strategies to reduce space usage
Nimbus:
- runtime switch:
--prune:archive
- backend: rocksdb
- compression: unclear
- synced blocks: 1M blocks
- space usage: 28.5GB
Geth:
- runtime switch:
--syncmode full --gcmode=archive
- backend: leveldb
- compression: snappy
- synced blocks: 1.2M blocks
- space usage: 11.8GB
The statistics above shows a huge gap between Nimbus and Geth space usage. When I tap into Geth database, I found something interesting. The space savings not only come from snappy compression.
Here is what happened during a block sync from state trie perspective:
pre state -> some intermediate states -> post state
Between pre state
and post state
, there can be zero or more intermediate states.
And here is the difference between Geth and Nimbus:
Geth: only pre state
and post state
accessible, no intermediate states exists.
Nimbus: all states are accessible.
This difference is interesting. How Geth doing this? I still have no idea, need to look into it's source code, or ask someone who know. We can research this topic and try to integrate it into Nimbus.
Perhaps you also know other strategies to reduce space usage when Nimbus is doing full sync with trie pruning turned off.
I think the solution is based on applying the block changes in a memory layer (e.g. created with beginTransaction
) that gets pruned in a better way before writing to the database.
rocksdb supports snappy encryption in general, but it's not compiled in everywhere.. it might also be that we disabled it in our code to avoid runtime issues when the distro comes with a rocksdb without snappy (fedora is like this, a few others as well)
https://github.com/facebook/rocksdb/issues/4283 is the relevant bug report
that said, given the high number of hashes and other random-looking data, returns on compression might be limited - this is something parity mentioned in their motivation to develop a custom storage backend - that, and access patterns that are unfriendly to the rocksdb cache.
this is something parity mentioned in their motivation to develop a custom storage backend - that, and access patterns that are unfriendly to the rocksdb cache.
hmm, perhaps this is why I cannot look into parity database using standard/our rocksdb.
hmm, perhaps this is why I cannot look into parity database using standard/our rocksdb.
from what I've heard, the parity ethereum client is still on rocksdb, with paritydb being developed but not integrated into the eth client (yet?)
rocksdb supports snappy encryption in general, but it's not compiled in everywhere.. it might also be that we disabled it in our code to avoid runtime issues when the distro comes with a rocksdb without snappy (fedora is like this, a few others as well)
I don't know whether the Debian and Ubuntu packages (for universe, Ubuntu ordinarily doesn't much change Debian's packaging) support Snappy, but I'd like to be able to continue using them -- https://packages.debian.org/sid/librocksdb-dev for example. Having seen the trouble rocksdb can be to build, it would easily dominate every other part of building Nimbus or nim-beacon-chain. That one library, alone.
rocksdb supports snappy encryption in general, but it's not compiled in everywhere.. it might also be that we disabled it in our code to avoid runtime issues when the distro comes with a rocksdb without snappy (fedora is like this, a few others as well)
I don't know whether the Debian and Ubuntu packages (for universe, Ubuntu ordinarily doesn't much change Debian's packaging) support Snappy, but I'd like to be able to continue using them -- https://packages.debian.org/sid/librocksdb-dev for example. Having seen the trouble rocksdb can be to build, it would easily dominate every other part of building Nimbus or nim-beacon-chain. That one library, alone.
More broadly, this helps in onboarding others -- when I wrote/found
Compile from source or use the package manager of your OS; for example, Debian, Ubuntu, and Fedora have working RocksDB packages
The point was to help people not have to figure out RocksDB compilation before using Nimbus for the most popular distributions. That seems valuable to me for Nimbus as a whole, moreso than saving some storage space, until/unless Nimbus itself becomes enough of a draw as an Ethereum 1 client on its own merits.
I think I figure out how geth 'remove' intermediate states from state trie. geth have two layers cache:
- database read/write cache to speed up database access.(it can significantly counter DDOS attack during DAO fork)
- accounts cache inside stateDB.(also speed up extCodeSize spam attack)
The accounts cache only write final states after block validation done. that will explain why geth doesn't have intermediate states in it's database.
geth also use a second in-memory discardable state trie to compute intermediate state root when making transaction receipt, but the intermediate state were discarded and not written into database.
contrary to geth, during transaction execution, nimbus build the state trie with complete history of intermediate states, and write them all into database at the end of block validation.
this is where the huge gap between nimbus and geth database size created, it has significant impact on performance too.
geth 1.9.0 comes with several space optimizations, but I'd also look into lessons learned from turbogeth
Twitter announcement here apparently they also managed to offload most of the data onto an HDD and keep only a very little part in the SSD needed for the blockchain's latest state, further reducing requirements.
yeah, those kind of hot/cold optimizations are possible, but at that point I wouldn't use a generic database backend like rocks or anything, but rather rely on custom binary files (like say.. git). it'll likely be easier in eth2 because of finalization and less need to support deep reorgs.
Added the sync label just as a reminder that this is one of the blockers to syncing on mainnet. The storage size is simply too large at the moment, so much that nobody completed the sync process on mainnet even 2 years ago, and the Ethereum state has grown a lot since then.
(Probably the sheer amount of I/O in writing so much in a database-unfriendly access pattern also contributes, and anything which reduces the space usage will reduce I/O as well, but I/O access pattern optimisation is it's own special subject.)