erigon
erigon copied to clipboard
[WIP] E23 commitment
added commitment to aggregation run in erigon22 fixed allSnapshot database reading in erigon2 fixed genesis initialization
Currently commitment doesn't provides correct hashes after 3 blocks due to reading from state or history
Get thruough bunch of bugs with EF merging and updating ReadWrapper23 aggregator context, currently stumbling on issue with commitment evaluation after merge - probably some issue with merging commitment domain (since root mismatch happens after agggregator merges only, and if I increase amount of transactions before merge - root mismatch happens latter. Investigating on it.
For EF merging I added min heap which ensures that merged ef will contain unique elements from both EF.
what is EF
?
@AskAlexSharov Elias-Fano encoded offsets
Today I finally localized issue. Commitment root hash mismatch occurred after domains merge due to reading obsolete data from Domain. I traced all writes to state and detected that for specific address (which has been touched before merge) Domain.Get()
returns account with nonce=1
while actual nonce is 5
. Spent some time observing merge code - no issues there. Finally got to code of Domain.prune
.
If i disable pruning, issue is gone: actual value with nonce=5 has been deleted during prune while nonce=1 is not removed. prune takes as arguments current aggregation step, txFrom, txTo. In my case step=25, account value with n=5 has step 25 and value with n=1 has step 22. I'm not sure about how pruning was designed to be, but probably if there are several values for that key with different invertedStep
value, store value with largest invertedStep
. Is that correct? Slightly brokes complexity of prune operation since we could not aggregate and decide on deletion for one full iteration.
branch cleaned and rebased up to current devel branch
Current state is that:
- Goerli commitment and merge processed correctly, but after several merges get panic while Domain Index (elias-fano) is accessed. might be some issues during merge.
- mainnet genesis block rootHash mismatch
Example of crash log:
INFO[08-20|10:24:28.643] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
INFO[08-20|10:24:58.644] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
INFO[08-20|10:25:28.644] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
CRIT[08-20|10:25:36.626] [index] calculating file=accounts.4-6.efi
CRIT[08-20|10:25:39.531] [index] write file=accounts.4-6.efi
INFO[08-20|10:25:56.458] [merge] Compressed millions=10
INFO[08-20|10:25:58.643] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=5.2GB sys=22.9GB
CRIT[08-20|10:26:17.458] [index] calculating file=accounts.4-6.vi
INFO[08-20|10:26:28.644] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=5.3GB sys=22.9GB
INFO[08-20|10:26:58.644] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=6.1GB sys=22.9GB
CRIT[08-20|10:27:25.186] [index] write file=accounts.4-6.vi
CRIT[08-20|10:27:27.119] [index] calculating file=accounts.4-6.kvi
INFO[08-20|10:27:28.652] Progress block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=7.0GB sys=22.9GB
CRIT[08-20|10:27:30.222] [index] write file=accounts.4-6.kvi
findMergeRange(18750000, 100000000)={accounts:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} storage:{valuesStartTxNum:0 valuesEndTxNum:0 values:false hi
storyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} code:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} comm
itment:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} logAddrsStartTxNum:0 logAddrsEndTxNum:0 logAddrs:false logTopicsStartTxNum:0 logTopicsEndTxNum:0 lo
gTopics:false tracesFromStartTxNum:0 tracesFromEndTxNum:0 tracesFrom:false tracesToStartTxNum:0 tracesToEndTxNum:0 tracesTo:false}
unexpected fault address 0x79635d657e0b
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x79635d657e0b pc=0xa5e436]
goroutine 1 [running, locked to thread]:
runtime.throw({0x15729a5?, 0xa25228568b845ee7?})
runtime/panic.go:992 +0x71 fp=0xc06a1a56d8 sp=0xc06a1a56a8 pc=0x45b911
runtime.sigpanic()
runtime/signal_unix.go:825 +0x305 fp=0xc06a1a5728 sp=0xc06a1a56d8 pc=0x471cc5
github.com/ledgerwatch/erigon-lib/recsplit/eliasfano16.(*DoubleEliasFano).get2(0x0?, 0x4?)
github.com/ledgerwatch/[email protected]/recsplit/eliasfano16/elias_fano.go:447 +0x56 fp=0xc06a1a57b8 sp=0xc06a1a5728 pc=0xa5e436
github.com/ledgerwatch/erigon-lib/recsplit/eliasfano16.(*DoubleEliasFano).Get3(0xc023b46240, 0x5f7)
github.com/ledgerwatch/[email protected]/recsplit/eliasfano16/elias_fano.go:512 +0x27 fp=0xc06a1a57d8 sp=0xc06a1a57b8 pc=0xa5e9e7
github.com/ledgerwatch/erigon-lib/recsplit.(*Index).Lookup(0xc023b461c0, 0xc0e20607c0?, 0x2f3458eade8d3e0d)
github.com/ledgerwatch/[email protected]/recsplit/index.go:196 +0xa5 fp=0xc06a1a5878 sp=0xc06a1a57d8 pc=0xa628e5
github.com/ledgerwatch/erigon-lib/recsplit.(*IndexReader).Lookup(0xc066650270, {0xc0e20607c0?, 0x10?, 0x25139e0?})
github.com/ledgerwatch/[email protected]/recsplit/index_reader.go:61 +0x45 fp=0xc06a1a58a8 sp=0xc06a1a5878 pc=0xa63ae5
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).readFromFiles.func1(0xc0564f6820)
github.com/ledgerwatch/[email protected]/state/domain.go:853 +0x65 fp=0xc06a1a5908 sp=0xc06a1a58a8 pc=0xa82365
github.com/google/btree.(*node[...]).iterate(0xc056fd1980, 0xffffffffffffffff, {0x0, 0xe0?}, {0x0?, 0xe0?}, 0x0?, 0x0, 0xc06a1a5a18)
github.com/google/[email protected]/btree_generic.go:555 +0x66a fp=0xc06a1a5988 sp=0xc06a1a5908 pc=0x968c6a
github.com/google/btree.(*BTreeG[...]).Descend(0x1b77de0?, 0xc062a221c0?)
github.com/google/[email protected]/btree_generic.go:815 +0x45 fp=0xc06a1a59e0 sp=0xc06a1a5988 pc=0x9698a5
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).readFromFiles(0x203014?, {0xc0e20607c0?, 0xc0500e1c00?, 0xc0e2060734?})
github.com/ledgerwatch/[email protected]/state/domain.go:849 +0x8a fp=0xc06a1a5a58 sp=0xc06a1a59e0 pc=0xa822aa
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).get(0xc047eaa480, {0xc0e20607c0, 0x34, 0x34}, {0x1b91d20, 0xc000472060})
github.com/ledgerwatch/[email protected]/state/domain.go:225 +0x350 fp=0xc06a1a5b18 sp=0xc06a1a5a58 pc=0xa7bd90
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).Get(0x500e1c00?, {0xc06a1a5bbc?, 0x14, 0xadff9c?}, {0xc06a1a5bd0, 0x20, 0xc06a1a5be0?}, {0x1b91d20, 0xc000472060})
github.com/ledgerwatch/[email protected]/state/domain.go:242 +0xcf fp=0xc06a1a5b70 sp=0xc06a1a5b18 pc=0xa7c06f
github.com/ledgerwatch/erigon-lib/state.(*AggregatorContext).ReadAccountStorage(...)
github.com/ledgerwatch/[email protected]/state/aggregator.go:676
github.com/ledgerwatch/erigon/cmd/state/commands.(*ReaderWrapper23).ReadAccountStorage(0x449518f8f40bf996?, {0x54, 0xbf, 0x39, 0xed, 0x7d, 0xf, 0x44, 0x86, 0xf3, ...}, ...)
github.com/ledgerwatch/erigon/cmd/state/commands/erigon23.go:466 +0x85 fp=0xc06a1a5c00 sp=0xc06a1a5b70 pc=0x121a445
github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState(0xc000246f20, 0xc0001d8a70, 0xc0da6d1240)
github.com/ledgerwatch/erigon/core/state/state_object.go:186 +0xf2 fp=0xc06a1a5c98 sp=0xc06a1a5c00 pc=0xaed932
github.com/ledgerwatch/erigon/core/state.(*stateObject).GetState(0xc000246f20, 0xc0001d8a70, 0xc0da6d1240)
github.com/ledgerwatch/erigon/core/state/state_object.go:163 +0xaf fp=0xc06a1a5d00 sp=0xc06a1a5c98 pc=0xaed7cf
github.com/ledgerwatch/erigon/core/state.(*IntraBlockState).GetState(0x7ec0219dfb6c68f9?, {0x54, 0xbf, 0x39, 0xed, 0x7d, 0xf, 0x44, 0x86, 0xf3, ...}, ...)
github.com/ledgerwatch/erigon/core/state/intra_block_state.go:306 +0x53 fp=0xc06a1a5d38 sp=0xc06a1a5d00 pc=0xadc073
github.com/ledgerwatch/erigon/core/vm.opSload(0xc060d5aec0?, 0xc0ae698f30, 0x20?)
github.com/ledgerwatch/erigon/core/vm/instructions.go:559 +0x187 fp=0xc06a1a5e00 sp=0xc06a1a5d38 pc=0xd03387
github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run(0xc0ae698f30, 0xc05efc2820, {0xc047e8cc30, 0xe4, 0xe4}, 0x0)
Currently, both mainnet and goerli commitment works, but merge issue mentioned above still happens. Depending on aggregation step, requires several merges before crash. For aggstep=10k
took 17 merges at Goerli, for 100k - block=1857594
and still running. Didn't met issue on mainnet yet.
Added ability to restart after successful merge. I decided do not leave not-merged data in db, better to merge everything at time and be sure that db\hist are coherent.
Fixed elias-fano panic (as far as I could see during testing).
Fixed pruning db by verifying that pruned step is not the latest step in db before delete. Probably, brought another problem that there will be abandoned steps in db. With that, when database is ahead of written files for one step block processing works.
Added to commitment two keys - latesttx
and roothash{txNum}
. Both are inserted when ComputeCommitment is called. latesttx
used to seek latest committed tx number during aggregator restart, roothash stores encoded state of HexPatriciaHash right after commitment evaluation.
For commitment used both approaches - read directly by keys from state and accumulate state updates before evaluation. Now they are both enabled and checks that both methods produces similar hashes.
could solve merge conflict only after merge of https://github.com/ledgerwatch/erigon-lib/pull/647
ledgerwatch/erigon-lib#647
it's ok to refer to non-merged erigon-lib branch from this PR. We usually do this way - because it allow merge erigon-lib's PR only if Erigon's CI is green.
it's ok to refer to non-merged erigon-lib branch from this PR. We usually do this way - because it allow merge erigon-lib's PR only if Erigon's CI is green.
Yes but it doesn't work for current situation - i'm trying to keep up erigon-lib commitment branch with main, but it takes time to verify build after rebasing, so branch inevitably not as fresh as trunk.