go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

leveldb filesystem structure with no subdirectories will become a problem over time

Open c0deright opened this issue 3 years ago • 3 comments

geth uses leveldb to store the blockchain in $datadir/geth/chaindata.

Right now I'm running geth with --syncmode full and geth is at block 4,834,322 (last block right now: 14,979,825). With appr. 32% synced so far the directory $datadir/geth/chaindata has ~76,000 files in it:

find /data/geth/chaindata/ -mindepth 1 -maxdepth 1 -type f | wc -l
76470

Most of these leveldb files are only 2.1MB in size.

With ever increasing inodes in the directory chaindata it will become a problem for some filesystems to even list the contents of that directory. It would be much more useful to use a directory structure that places files in one or two level deep directories so there won't be a single directory with a million small files one day.

Bitcoin Core for example stores raw block data in files of size ~128MB and then uses leveldb to store an index only.

Running geth with --gcmode archive most definately will render the chaindata directory unmanageable (think of backups, rsync, ...). Each process that openes the directory to read the directory structure will take ages.

c0deright avatar Jun 17 '22 15:06 c0deright

echo 3 > /proc/sys/vm/drop_caches
time ls -l /data/geth/geth/chaindata >/dev/null

real    0m2.248s
user    0m0.229s
sys     0m0.622s

over 2.2 seconds to generate the directory listing for a little over 70.000 files on an AWS EBS volume (SSD backed).

c0deright avatar Jun 17 '22 15:06 c0deright

We're aware of this issue.

Using files larger than 2MB blows up disk IO as compaction becomes exponentially heavier. Would have been nice to split the files across multiple folders in leveldb, but it does not support that and I'm not confident enough about starting to implement a new storage engine, especially as the upstream project doesn't really accept contributions any more.

We're currently in progress of experimenting with Pebble, aiming to switch over to that fully eventually. I'm unsure if that supports nested dbs, but it might make more sense to try and get it into that. Raising the level sizes still causes insane writes in Pebble too.

karalabe avatar Jun 20 '22 07:06 karalabe

We're aware of this issue.

Using files larger than 2MB blows up disk IO as compaction becomes exponentially heavier. Would have been nice to split the files across multiple folders in leveldb, but it does not support that and I'm not confident enough about starting to implement a new storage engine, especially as the upstream project doesn't really accept contributions any more.

We're currently in progress of experimenting with Pebble, aiming to switch over to that fully eventually. I'm unsure if that supports nested dbs, but it might make more sense to try and get it into that. Raising the level sizes still causes insane writes in Pebble too.

And the real updates from DBs which are leveldb-based - in rocksDB only. At least they've added column families...

g2px1 avatar Nov 19 '22 18:11 g2px1