erigon icon indicating copy to clipboard operation
erigon copied to clipboard

[WIP] E23 commitment

Open awskii opened this issue 2 years ago • 5 comments

added commitment to aggregation run in erigon22 fixed allSnapshot database reading in erigon2 fixed genesis initialization

Currently commitment doesn't provides correct hashes after 3 blocks due to reading from state or history

awskii avatar Aug 05 '22 15:08 awskii

Get thruough bunch of bugs with EF merging and updating ReadWrapper23 aggregator context, currently stumbling on issue with commitment evaluation after merge - probably some issue with merging commitment domain (since root mismatch happens after agggregator merges only, and if I increase amount of transactions before merge - root mismatch happens latter. Investigating on it.

For EF merging I added min heap which ensures that merged ef will contain unique elements from both EF.

awskii avatar Aug 16 '22 11:08 awskii

what is EF?

AskAlexSharov avatar Aug 16 '22 12:08 AskAlexSharov

@AskAlexSharov Elias-Fano encoded offsets Today I finally localized issue. Commitment root hash mismatch occurred after domains merge due to reading obsolete data from Domain. I traced all writes to state and detected that for specific address (which has been touched before merge) Domain.Get() returns account with nonce=1 while actual nonce is 5. Spent some time observing merge code - no issues there. Finally got to code of Domain.prune.

If i disable pruning, issue is gone: actual value with nonce=5 has been deleted during prune while nonce=1 is not removed. prune takes as arguments current aggregation step, txFrom, txTo. In my case step=25, account value with n=5 has step 25 and value with n=1 has step 22. I'm not sure about how pruning was designed to be, but probably if there are several values for that key with different invertedStep value, store value with largest invertedStep. Is that correct? Slightly brokes complexity of prune operation since we could not aggregate and decide on deletion for one full iteration.

awskii avatar Aug 17 '22 16:08 awskii

branch cleaned and rebased up to current devel branch

awskii avatar Aug 19 '22 16:08 awskii

Current state is that:

  • Goerli commitment and merge processed correctly, but after several merges get panic while Domain Index (elias-fano) is accessed. might be some issues during merge.
  • mainnet genesis block rootHash mismatch

Example of crash log:

INFO[08-20|10:24:28.643] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
INFO[08-20|10:24:58.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
INFO[08-20|10:25:28.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
CRIT[08-20|10:25:36.626] [index] calculating                      file=accounts.4-6.efi
CRIT[08-20|10:25:39.531] [index] write                            file=accounts.4-6.efi
INFO[08-20|10:25:56.458] [merge] Compressed                       millions=10
INFO[08-20|10:25:58.643] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=5.2GB sys=22.9GB
CRIT[08-20|10:26:17.458] [index] calculating                      file=accounts.4-6.vi
INFO[08-20|10:26:28.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=5.3GB sys=22.9GB
INFO[08-20|10:26:58.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=6.1GB sys=22.9GB
CRIT[08-20|10:27:25.186] [index] write                            file=accounts.4-6.vi
CRIT[08-20|10:27:27.119] [index] calculating                      file=accounts.4-6.kvi
INFO[08-20|10:27:28.652] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=7.0GB sys=22.9GB
CRIT[08-20|10:27:30.222] [index] write                            file=accounts.4-6.kvi
findMergeRange(18750000, 100000000)={accounts:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} storage:{valuesStartTxNum:0 valuesEndTxNum:0 values:false hi
storyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} code:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} comm
itment:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} logAddrsStartTxNum:0 logAddrsEndTxNum:0 logAddrs:false logTopicsStartTxNum:0 logTopicsEndTxNum:0 lo
gTopics:false tracesFromStartTxNum:0 tracesFromEndTxNum:0 tracesFrom:false tracesToStartTxNum:0 tracesToEndTxNum:0 tracesTo:false}
unexpected fault address 0x79635d657e0b
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x79635d657e0b pc=0xa5e436]

goroutine 1 [running, locked to thread]:
runtime.throw({0x15729a5?, 0xa25228568b845ee7?})
        runtime/panic.go:992 +0x71 fp=0xc06a1a56d8 sp=0xc06a1a56a8 pc=0x45b911
runtime.sigpanic()
        runtime/signal_unix.go:825 +0x305 fp=0xc06a1a5728 sp=0xc06a1a56d8 pc=0x471cc5
github.com/ledgerwatch/erigon-lib/recsplit/eliasfano16.(*DoubleEliasFano).get2(0x0?, 0x4?)
        github.com/ledgerwatch/[email protected]/recsplit/eliasfano16/elias_fano.go:447 +0x56 fp=0xc06a1a57b8 sp=0xc06a1a5728 pc=0xa5e436
github.com/ledgerwatch/erigon-lib/recsplit/eliasfano16.(*DoubleEliasFano).Get3(0xc023b46240, 0x5f7)
        github.com/ledgerwatch/[email protected]/recsplit/eliasfano16/elias_fano.go:512 +0x27 fp=0xc06a1a57d8 sp=0xc06a1a57b8 pc=0xa5e9e7
github.com/ledgerwatch/erigon-lib/recsplit.(*Index).Lookup(0xc023b461c0, 0xc0e20607c0?, 0x2f3458eade8d3e0d)
        github.com/ledgerwatch/[email protected]/recsplit/index.go:196 +0xa5 fp=0xc06a1a5878 sp=0xc06a1a57d8 pc=0xa628e5
github.com/ledgerwatch/erigon-lib/recsplit.(*IndexReader).Lookup(0xc066650270, {0xc0e20607c0?, 0x10?, 0x25139e0?})
        github.com/ledgerwatch/[email protected]/recsplit/index_reader.go:61 +0x45 fp=0xc06a1a58a8 sp=0xc06a1a5878 pc=0xa63ae5
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).readFromFiles.func1(0xc0564f6820)
        github.com/ledgerwatch/[email protected]/state/domain.go:853 +0x65 fp=0xc06a1a5908 sp=0xc06a1a58a8 pc=0xa82365
github.com/google/btree.(*node[...]).iterate(0xc056fd1980, 0xffffffffffffffff, {0x0, 0xe0?}, {0x0?, 0xe0?}, 0x0?, 0x0, 0xc06a1a5a18)
        github.com/google/[email protected]/btree_generic.go:555 +0x66a fp=0xc06a1a5988 sp=0xc06a1a5908 pc=0x968c6a
github.com/google/btree.(*BTreeG[...]).Descend(0x1b77de0?, 0xc062a221c0?)
        github.com/google/[email protected]/btree_generic.go:815 +0x45 fp=0xc06a1a59e0 sp=0xc06a1a5988 pc=0x9698a5
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).readFromFiles(0x203014?, {0xc0e20607c0?, 0xc0500e1c00?, 0xc0e2060734?})
        github.com/ledgerwatch/[email protected]/state/domain.go:849 +0x8a fp=0xc06a1a5a58 sp=0xc06a1a59e0 pc=0xa822aa
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).get(0xc047eaa480, {0xc0e20607c0, 0x34, 0x34}, {0x1b91d20, 0xc000472060})
        github.com/ledgerwatch/[email protected]/state/domain.go:225 +0x350 fp=0xc06a1a5b18 sp=0xc06a1a5a58 pc=0xa7bd90
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).Get(0x500e1c00?, {0xc06a1a5bbc?, 0x14, 0xadff9c?}, {0xc06a1a5bd0, 0x20, 0xc06a1a5be0?}, {0x1b91d20, 0xc000472060})
        github.com/ledgerwatch/[email protected]/state/domain.go:242 +0xcf fp=0xc06a1a5b70 sp=0xc06a1a5b18 pc=0xa7c06f
github.com/ledgerwatch/erigon-lib/state.(*AggregatorContext).ReadAccountStorage(...)
        github.com/ledgerwatch/[email protected]/state/aggregator.go:676
github.com/ledgerwatch/erigon/cmd/state/commands.(*ReaderWrapper23).ReadAccountStorage(0x449518f8f40bf996?, {0x54, 0xbf, 0x39, 0xed, 0x7d, 0xf, 0x44, 0x86, 0xf3, ...}, ...)
        github.com/ledgerwatch/erigon/cmd/state/commands/erigon23.go:466 +0x85 fp=0xc06a1a5c00 sp=0xc06a1a5b70 pc=0x121a445
github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState(0xc000246f20, 0xc0001d8a70, 0xc0da6d1240)
        github.com/ledgerwatch/erigon/core/state/state_object.go:186 +0xf2 fp=0xc06a1a5c98 sp=0xc06a1a5c00 pc=0xaed932
github.com/ledgerwatch/erigon/core/state.(*stateObject).GetState(0xc000246f20, 0xc0001d8a70, 0xc0da6d1240)
        github.com/ledgerwatch/erigon/core/state/state_object.go:163 +0xaf fp=0xc06a1a5d00 sp=0xc06a1a5c98 pc=0xaed7cf
github.com/ledgerwatch/erigon/core/state.(*IntraBlockState).GetState(0x7ec0219dfb6c68f9?, {0x54, 0xbf, 0x39, 0xed, 0x7d, 0xf, 0x44, 0x86, 0xf3, ...}, ...)
        github.com/ledgerwatch/erigon/core/state/intra_block_state.go:306 +0x53 fp=0xc06a1a5d38 sp=0xc06a1a5d00 pc=0xadc073
github.com/ledgerwatch/erigon/core/vm.opSload(0xc060d5aec0?, 0xc0ae698f30, 0x20?)
        github.com/ledgerwatch/erigon/core/vm/instructions.go:559 +0x187 fp=0xc06a1a5e00 sp=0xc06a1a5d38 pc=0xd03387
github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run(0xc0ae698f30, 0xc05efc2820, {0xc047e8cc30, 0xe4, 0xe4}, 0x0)

awskii avatar Aug 23 '22 18:08 awskii

Currently, both mainnet and goerli commitment works, but merge issue mentioned above still happens. Depending on aggregation step, requires several merges before crash. For aggstep=10k took 17 merges at Goerli, for 100k - block=1857594 and still running. Didn't met issue on mainnet yet.

awskii avatar Aug 29 '22 08:08 awskii

Added ability to restart after successful merge. I decided do not leave not-merged data in db, better to merge everything at time and be sure that db\hist are coherent.

Fixed elias-fano panic (as far as I could see during testing).

awskii avatar Sep 03 '22 09:09 awskii

Fixed pruning db by verifying that pruned step is not the latest step in db before delete. Probably, brought another problem that there will be abandoned steps in db. With that, when database is ahead of written files for one step block processing works.

Added to commitment two keys - latesttx and roothash{txNum}. Both are inserted when ComputeCommitment is called. latesttx used to seek latest committed tx number during aggregator restart, roothash stores encoded state of HexPatriciaHash right after commitment evaluation.

For commitment used both approaches - read directly by keys from state and accumulate state updates before evaluation. Now they are both enabled and checks that both methods produces similar hashes.

awskii avatar Sep 07 '22 17:09 awskii

could solve merge conflict only after merge of https://github.com/ledgerwatch/erigon-lib/pull/647

awskii avatar Sep 23 '22 19:09 awskii

ledgerwatch/erigon-lib#647

it's ok to refer to non-merged erigon-lib branch from this PR. We usually do this way - because it allow merge erigon-lib's PR only if Erigon's CI is green.

AskAlexSharov avatar Sep 26 '22 02:09 AskAlexSharov

ledgerwatch/erigon-lib#647

it's ok to refer to non-merged erigon-lib branch from this PR. We usually do this way - because it allow merge erigon-lib's PR only if Erigon's CI is green.

Yes but it doesn't work for current situation - i'm trying to keep up erigon-lib commitment branch with main, but it takes time to verify build after rebasing, so branch inevitably not as fresh as trunk.

awskii avatar Sep 26 '22 12:09 awskii