go-ethereum
go-ethereum copied to clipboard
ethdb/pebble: add database backend using pebble
Uses Pebble patched to expose an API for getting the amount of heap memory allocated through CGo: https://github.com/jwasinger/pebble/tree/mem-stats .
Modifies the system/memory/used and system/memory/held gauges to include Pebble's allocation through CGo.
Charts from a snap sync (LevelDB is orange, Pebble is green):




Pebble disk usage:
311.7G /datadir/geth/geth/chaindata/ancient
503.6G /datadir/geth/geth/chaindata
230.3M /datadir/geth/geth/ethash
670.1M /datadir/geth/geth/triecache
4.4M /datadir/geth/geth/nodes
504.5G /datadir/geth/geth
4.0K /datadir/geth/keystore
504.5G /datadir/geth
Leveldb disk usage:
693.2M /datadir/geth/geth/triecache
230.3M /datadir/geth/geth/ethash
311.7G /datadir/geth/geth/chaindata/ancient
559.7G /datadir/geth/geth/chaindata
3.6M /datadir/geth/geth/nodes
560.6G /datadir/geth/geth
4.0K /datadir/geth/keystore
560.6G /datadir/geth
Update: still waiting for upstream on https://github.com/cockroachdb/pebble/pull/1628
Actually they have responded https://github.com/cockroachdb/pebble/pull/1628#pullrequestreview-1026664054 .
@jwasinger would you mind rebasing this?
@holiman done.
Testing this on two bootnodes which have a very hard time syncing
ansible-playbook playbook.yaml -t geth -l bootnode-azure-westus-001,bootnode-azure-koreasouth-001 -e "geth_image=holiman/geth-experimental:latest" -e "geth_datadir_wipe=partial" -e '{"geth_args_custom":["--backingdb=pebble"]}'
and for comparison:
ansible-playbook playbook.yaml -t geth -l bootnode-azure-brazilsouth-001,bootnode-azure-australiaeast-001 -e "geth_image=holiman/geth-experimental:latest" -e "geth_datadir_wipe=partial"
Metrics that do not appear to work:
geth.eth/db/chaindata/disk/read.meter
geth.eth/db/chaindata/compact/writedelay/counter.meter
Edi - here's why the disk read meter is always zerot:
if db.diskReadMeter != nil {
db.diskReadMeter.Mark(0) // pebble doesn't track non-compaction reads
}
Is there any point in even having the metric around? Though I suppose we can keep it for a while, to stay in par with leveldb.
It's still very early, but it looks like on our 'weak' azure nodes, pebble is performing a lot better, since it's not being killed-by-compaction:

The pebble azure nodes are on ~26%
bootnode-azure-westus-001 geth INFO [08-29|16:20:37.679] State sync in progress synced=26.53% state=51.08GiB accounts=49,463,[email protected] slots=193,320,[email protected] codes=206,[email protected] eta=8h46m27.524s
bootnode-azure-koreasouth-001 geth INFO [08-29|16:20:38.621] State sync in progress synced=26.15% state=50.79GiB accounts=48,789,[email protected] slots=192,700,[email protected] codes=203,[email protected] eta=8h56m8.768s
The non-pebble are on ~15%
bootnode-azure-brazilsouth-001 geth INFO [08-29|16:20:37.959] State sync in progress synced=13.69% state=27.49GiB accounts=26,174,[email protected] slots=104,749,[email protected] codes=117,[email protected] eta=19h31m38.490s
bootnode-azure-australiaeast-001 geth INFO [08-29|16:20:46.154] State sync in progress synced=16.99% state=32.69GiB accounts=32,593,[email protected] slots=123,280,[email protected] codes=141,[email protected] eta=15h11m3.934s
Pebble-nodes finished the first phase a couple of hours earlier
Aug 30 03:30:56 bootnode-azure-koreasouth-001 geth INFO [08-30|01:30:55.936] State sync in progress synced=100.00% state=197.38GiB accounts=186,306,[email protected] slots=760,255,[email protected] codes=651,[email protected] eta=-2m14.173s
Aug 30 04:48:43 bootnode-azure-westus-001 geth INFO [08-30|02:48:43.208] State sync in progress synced=100.00% state=197.33GiB accounts=186,328,[email protected] slots=760,016,[email protected] codes=651,[email protected] eta=-2m31.628s
Leveldb-nodes:
Aug 30 06:31:50 bootnode-azure-australiaeast-001 geth INFO [08-30|04:31:50.585] State sync in progress synced=100.00% state=197.47GiB accounts=186,182,[email protected] slots=760,605,[email protected] codes=652,[email protected] eta=-2m43.671s
Aug 30 08:38:23 bootnode-azure-brazilsouth-001 geth INFO [08-30|06:38:23.436] State sync in progress synced=100.00% state=197.51GiB accounts=186,552,[email protected] slots=760,745,[email protected] codes=652,[email protected] eta=-1m27.429s
RIght, there's this too:
C:\Users\appveyor\go\pkg\mod\github.com\jwasinger\[email protected]\internal\batchskl\skl.go:310:18: maxNodesSize (untyped int constant 4294967295) overflows int
C:\Users\appveyor\go\pkg\mod\github.com\jwasinger\[email protected]\internal\batchskl\skl.go:320:16: cannot use maxNodesSize (untyped int constant 4294967295) as int value in assignment (overflows)
Which afaict would be fixed by https://github.com/cockroachdb/pebble/pull/1619. It has been open since april.
Found a little issue : Got cache configured to 256K (--cache 262144) ./ethdb/pebble/pebble.go:161: MemTableSize: cache * 1024 * 1024 / 4,
=>
geth[2101456]: Fatal: Failed to register the Ethereum service: MemTableSize (21 G) must be < 4.0 G
So, MemTableSize should be capped to 4GB max
65536 => Sep 16 13:51:30 geth01-ethereum-mainnet-eu geth[2105780]: Fatal: Failed to register the Ethereum service: MemTableSize (8.0 G) must be < 4.0 G
32768 => Sep 16 13:52:20 geth01-ethereum-mainnet-eu geth[2106436]: Fatal: Failed to register the Ethereum service: MemTableSize (4.0 G) must be < 4.0 G
from https://github.com/cockroachdb/pebble/blob/master/options.go MemTableSize is an int
On an archive node, it doesn't seem to make sync faster, but at least, the ugly long compaction times (with some going up to 11 days full compacting in a row) are gone
2 archive nodes with standard geth with stair-case effect :

One of those 2 nodes using the pebble branch :

A mix of those 2 with both axes visible :

On 2022-06-29, started 2 archive nodes with geth 1.10.1x then 1.10.2x then 1.11.0 On 2022-07-13, stopped one of those 2, replaced it with 1.11.0, ex_pebble branch after wiping storage out.
@SLoeuillet thanks for the feedback and charts! ~~Unfortunately, the Y-axis got a bit cropped out, so I couldn't really figure out how the two charts compared.~~ Would love to see some more charts after a few more days of progress!
Ah, the max memtable size is not so much becaues the field is an int (int64), but
https://github.com/cockroachdb/pebble/blob/master/open.go#L38:
// The max memtable size is limited by the uint32 offsets stored in
// internal/arenaskl.node, DeferredBatchOp, and flushableBatchEntry.
maxMemTableSize = 4 << 30 // 4 GB
With a great pleasure, I can announce that my archive node running leveldb standard storage, did just finish to sync to HEAD 2022-06-09 => 2022-09-21
Pebble 1.11.0 based one, started sync on 2022-09-16, currently at 7925855 blocks.
I ran a successful snap sync with it again. Took a pretty long time on a very underprovisioned node (5.8GB usable RAM) but it finished after ~70 hours
Triage discussion: I'll take this PR and try to separate the 64-bit and 32-bit, and make it so that we avoid pebble when building 32-bit.
Closing in favour of https://github.com/ethereum/go-ethereum/pull/26517