neo-go Compare BoltDB and LevelDB performance and choose the default

Now that we have BoltDB support (#335) let's compare it with LevelDB for our usage scanarios. We have two basic DB use modes --- synchronization and reads.

For synchronization it shouldn't be hard to get real numbers because we have them printed by Persist(), it's just a matter of running our node against privnet/testnet/mainnet, collecting the log, parsing and graphing it (we're interested in time to persist a block and it's dependency on chain height, maybe also dependency on block count being stored in one batch). We're interested both in full synchronization and subsequent one in 15 seconds updates (that are better seen on privnet, probably).

Measuring read performance is a bit trickier in that we don't have exact usage model for it, but most of the reads should probably be done to fulfill other nodes requests. But at the moment our node doesn't answer to the getheaders or getblocks requests, so we need to make some synthetic tests here.

So the problem here is to define some reasonable test program and compare our DB backends. Then based on that results just choose the better one as a default.

Sep 20 '19 10:09 roman-khimov

Random profiler pictures from 2.5M+ import.

LevelDB: pprof-leveldb

BoltDB: pprof-boltdb

We're clearly DB-bound in LevelDB case, but not so in BoltDB case. And just looking at the chain import it seems that BoltDB-based chain imports things faster (and thus should also benefit more from #807), though we still need to measure this difference more accurately.

Mar 28 '20 20:03 roman-khimov

As part of #839 made a comparison of restore time for LevelDB, BoltDB and BadgerDB on testnet (blockHeight: 4099092). The results are:

Time	BadgerDB	LevelDB	BoltDB
real	58m47,802s	62m20,840s	72m14,454s
user	70m36,261s	98m17,229s	70m13,239s
sys	2m0,528s	11m27,052s	5m11,094s

(measured with $ time ./bin/neo-go db restore -t -i ../chain.acc with VerifyBlocks: false).

Apr 09 '20 15:04 AnnaShaleva

What about BoltDB?

Apr 09 '20 15:04 roman-khimov

Updated

Apr 09 '20 17:04 AnnaShaleva

It'd also be nice to run these different setups under neo-bench to see if there is any measurable difference.

Aug 18 '20 15:08 roman-khimov

@AnnaShaleva --- what's your relative numbers on Badger vs Level vs Bolt for neo-bench on single-node (30 workers for example, the best mode for now)? My VM shows ~10% drop in performance for Bolt and ~60% better results for Badger relative to the default Level.

Aug 20 '20 19:08 roman-khimov

That's a very special non-standard test with 5M (!) transactions and 120K memory pool: blb

The difference is quite obvious. See #2137 also.

Sep 23 '21 15:09 roman-khimov

I am wondering if leveldb TPS stopped falling at this point or could continue to do so.

Sep 24 '21 06:09 fyrchik

I'd expect it to deteriorate further while the node is still being stressed. Then compaction (which is asynchronous) can make things better, but I guess the DB can't make its best at compacting while we keep pushing data to it. See neo-project/neo#2606 also.

Sep 24 '21 07:09 roman-khimov

How about leaving this as is? We now have a lot more data including https://neospcc.medium.com/up-in-the-mountains-reaching-50k-tps-with-neo-2f864b30abfd https://neospcc.medium.com/cutting-blockchain-tail-with-neogo-5256a120f6bb

and while BoltDB is clearly better in many scenarios, it has some problems with others (like b5d39a3ffdb34b862db2c9eeb990f2cf9c1a9ba9) and for the default configuration we have (keep everything) on ordinary node that processes blocks LevelDB is still faster. Probably some additional read load (RPC queries) can change the game, but there is nothing inherently bad about LevelDB being the default, so we can just keep it as is.

Sep 03 '22 09:09 roman-khimov