full archive caplin restart loses too much processed state

Open errge opened this issue 1 year ago • 1 comments

System information

Erigon version: git main

OS & Version: Linux

Commit hash: d780e319cd

Erigon Command (with flags/config): ../../bin/erigon/erigon --datadir ./data --prune.mode archive --http.api=eth,erigon,web3,net,debug,trace,txpool --caplin.archive --beacon.api beacon,builder,config,debug,events,node,validator,lighthouse --diagnostics.disabled --nat none --private.api.addr --beacon.api.port 3500 --http.port 8545 --ws --ws.port 8546 --authrpc.port 8551 --torrent.port 20202 --port 30303 --p2p.protocol 68 --p2p.allowed-ports 30303 --caplin.discovery.tcpport 40404 --caplin.discovery.port 40404 --sentinel.port 50505 --rpc.batch.limit 50000 --db.read.concurrency 8 --rpc.returndata.limit 100000000000

Chain/Network: ethereum mainnet

Expected behaviour

When erigon is restarted, it should be back in normal operation in 1-2 minutes.

Actual behaviour

When erigon is ran in full sync mode (caplin archive also enabled), and erigon is restarted, caplin sync somehow jumps back around 20K slots, and then historical download has to be done, and then "State processing progress" has to process the downloaded slots.

[INFO] [10-14|01:55:57.856] State processing progress                slot=10164630 blk/sec=17.78

This goes on for 15-30 minutes (depending on luck).

Steps to reproduce the behaviour

Have an erigon fully synced, with caplin archival enabled, and do a restart.

Discussion

If there are plans to increase the "State processing progress" drastically (at least 10x), then this becomes a non-issue.

If that is not feasible, can we somehow make more frequent snapshots or save state on exit signal to make the restart less disastrous regarding waiting time?

Oct 14 '24 00:10 errge

Mmmmhhh - I cannot really help with this. I would just wait for alpha 6 here. Will keep this open

Oct 14 '24 23:10 Giulio2002