nimbus-eth1 icon indicating copy to clipboard operation
nimbus-eth1 copied to clipboard

ideas/strategies to reduce space usage

Open jangko opened this issue 5 years ago • 11 comments

Nimbus:

  • runtime switch: --prune:archive
  • backend: rocksdb
  • compression: unclear
  • synced blocks: 1M blocks
  • space usage: 28.5GB

Geth:

  • runtime switch: --syncmode full --gcmode=archive
  • backend: leveldb
  • compression: snappy
  • synced blocks: 1.2M blocks
  • space usage: 11.8GB

The statistics above shows a huge gap between Nimbus and Geth space usage. When I tap into Geth database, I found something interesting. The space savings not only come from snappy compression.

Here is what happened during a block sync from state trie perspective:

pre state  -> some intermediate states -> post state

Between pre state and post state, there can be zero or more intermediate states. And here is the difference between Geth and Nimbus:

Geth: only pre state and post state accessible, no intermediate states exists. Nimbus: all states are accessible.

This difference is interesting. How Geth doing this? I still have no idea, need to look into it's source code, or ask someone who know. We can research this topic and try to integrate it into Nimbus.

Perhaps you also know other strategies to reduce space usage when Nimbus is doing full sync with trie pruning turned off.

jangko avatar Mar 01 '19 10:03 jangko

I think the solution is based on applying the block changes in a memory layer (e.g. created with beginTransaction) that gets pruned in a better way before writing to the database.

zah avatar Mar 01 '19 10:03 zah

rocksdb supports snappy encryption in general, but it's not compiled in everywhere.. it might also be that we disabled it in our code to avoid runtime issues when the distro comes with a rocksdb without snappy (fedora is like this, a few others as well)

https://github.com/facebook/rocksdb/issues/4283 is the relevant bug report

that said, given the high number of hashes and other random-looking data, returns on compression might be limited - this is something parity mentioned in their motivation to develop a custom storage backend - that, and access patterns that are unfriendly to the rocksdb cache.

arnetheduck avatar Mar 06 '19 14:03 arnetheduck

this is something parity mentioned in their motivation to develop a custom storage backend - that, and access patterns that are unfriendly to the rocksdb cache.

hmm, perhaps this is why I cannot look into parity database using standard/our rocksdb.

jangko avatar Mar 06 '19 15:03 jangko

hmm, perhaps this is why I cannot look into parity database using standard/our rocksdb.

from what I've heard, the parity ethereum client is still on rocksdb, with paritydb being developed but not integrated into the eth client (yet?)

arnetheduck avatar Mar 06 '19 16:03 arnetheduck

rocksdb supports snappy encryption in general, but it's not compiled in everywhere.. it might also be that we disabled it in our code to avoid runtime issues when the distro comes with a rocksdb without snappy (fedora is like this, a few others as well)

I don't know whether the Debian and Ubuntu packages (for universe, Ubuntu ordinarily doesn't much change Debian's packaging) support Snappy, but I'd like to be able to continue using them -- https://packages.debian.org/sid/librocksdb-dev for example. Having seen the trouble rocksdb can be to build, it would easily dominate every other part of building Nimbus or nim-beacon-chain. That one library, alone.

tersec avatar Mar 06 '19 16:03 tersec

rocksdb supports snappy encryption in general, but it's not compiled in everywhere.. it might also be that we disabled it in our code to avoid runtime issues when the distro comes with a rocksdb without snappy (fedora is like this, a few others as well)

I don't know whether the Debian and Ubuntu packages (for universe, Ubuntu ordinarily doesn't much change Debian's packaging) support Snappy, but I'd like to be able to continue using them -- https://packages.debian.org/sid/librocksdb-dev for example. Having seen the trouble rocksdb can be to build, it would easily dominate every other part of building Nimbus or nim-beacon-chain. That one library, alone.

More broadly, this helps in onboarding others -- when I wrote/found

Compile from source or use the package manager of your OS; for example, Debian, Ubuntu, and Fedora have working RocksDB packages

The point was to help people not have to figure out RocksDB compilation before using Nimbus for the most popular distributions. That seems valuable to me for Nimbus as a whole, moreso than saving some storage space, until/unless Nimbus itself becomes enough of a draw as an Ethereum 1 client on its own merits.

tersec avatar Mar 06 '19 16:03 tersec

I think I figure out how geth 'remove' intermediate states from state trie. geth have two layers cache:

  1. database read/write cache to speed up database access.(it can significantly counter DDOS attack during DAO fork)
  2. accounts cache inside stateDB.(also speed up extCodeSize spam attack)

The accounts cache only write final states after block validation done. that will explain why geth doesn't have intermediate states in it's database.

geth also use a second in-memory discardable state trie to compute intermediate state root when making transaction receipt, but the intermediate state were discarded and not written into database.

contrary to geth, during transaction execution, nimbus build the state trie with complete history of intermediate states, and write them all into database at the end of block validation.

this is where the huge gap between nimbus and geth database size created, it has significant impact on performance too.

jangko avatar Apr 15 '19 15:04 jangko

geth 1.9.0 comes with several space optimizations, but I'd also look into lessons learned from turbogeth

arnetheduck avatar Apr 23 '19 02:04 arnetheduck

Twitter announcement here apparently they also managed to offload most of the data onto an HDD and keep only a very little part in the SSD needed for the blockchain's latest state, further reducing requirements.

Swader avatar Apr 24 '19 12:04 Swader

yeah, those kind of hot/cold optimizations are possible, but at that point I wouldn't use a generic database backend like rocks or anything, but rather rely on custom binary files (like say.. git). it'll likely be easier in eth2 because of finalization and less need to support deep reorgs.

arnetheduck avatar Apr 24 '19 20:04 arnetheduck

Added the sync label just as a reminder that this is one of the blockers to syncing on mainnet. The storage size is simply too large at the moment, so much that nobody completed the sync process on mainnet even 2 years ago, and the Ethereum state has grown a lot since then.

(Probably the sheer amount of I/O in writing so much in a database-unfriendly access pattern also contributes, and anything which reduces the space usage will reduce I/O as well, but I/O access pattern optimisation is it's own special subject.)

jlokier avatar May 11 '21 17:05 jlokier