Upgrading from erigon 2.48.1 to 2.5x (currently trying 2.54.0) causes node to stop syncing
System information
erigon version 2.54.0-aeec5221
OS & Version: Linux
Commit hash: tag - v2.54.0
Erigon Command (with flags/config): /usr/bin/erigon --config=/data/erigon/erigon.yaml
Consensus Layer: lighthouse Lighthouse v4.5.0-441fc16
Consensus Layer Command (with flags/config): /usr/bin/lighthouse bn --datadir /data/lighthouse --network mainnet --http --execution-endpoint http://127.0.0.1:8551 --execution-jwt /data/erigon/jwtsecret
Chain/Network: mainnet
Expected behaviour
Node properly syncs after version upgrarde
Actual behaviour
Node is no longer syncing, with messages: [WARN] [11-26|17:06:15.372] Served conn=127.0.0.1:26722 method=engine_forkchoiceUpdatedV2 reqid=1 t=68.793491ms err="unexpected chain cap: 18657009"
Steps to reproduce the behaviour
Full sync on v2.48.1, then upgrade to v2.54.0
Backtrace
N/A
show "integration print_stages"
We're seeing the same issue on our nodes.
Something I noticed on startup of erigon:
[INFO] [11-28|21:56:55.656] [db] chaindata sizeLimit=3TB pageSize=4096
When running integration print_stages:
INFO[11-28|22:00:48.990] logging to file system log dir=/blockchain/logs file prefix=integration log level=info json=false
INFO[11-28|22:00:48.992] [db] chaindata sizeLimit=2TB pageSize=4096
Note: prune_at doesn't mean 'all data before were deleted' - it just mean stage.Prune function were run to this block. Because 1 stage may prune multiple data types to different prune distance.
stage_at prune_at
Snapshots 18846895 0
Headers 18846895 0
BorHeimdall 0 0
BlockHashes 18846895 0
Bodies 18846895 0
Senders 18846895 0
Execution 18846895 18846895
Translation 0 0
HashState 18846895 0
IntermediateHashes 18846895 18846895
AccountHistoryIndex 18846895 0
StorageHistoryIndex 18846895 0
LogIndex 18846895 0
CallTraces 18846895 18846895
TxLookup 18846895 0
Finish 18846895 0
--
prune distance:
blocks.v2: blocks=0, segments=0, indices=0
history.v3: false, idx steps: 0.00, lastMaxTxNum=0->1, lastBlockInSnap=0
sequence: EthTx=2063258825, NonCanonicalTx=63761
in db: first header 1, last header 18952054, first body 1, last body 18952054
--
I'm observing the same on v2.55.1. Like for @bretep , the sizeLimit in container logs does not match the one from integration print_stages:
docker logs 2>&1 eth_erigon | grep chaindata
[INFO] [12-15|23:43:46.812] Opening Database label=chaindata path=/mnt/data/eth_erigon/core/chaindata
[INFO] [12-15|23:43:46.820] [db] chaindata sizeLimit=8TB pageSize=4096
/ $ integration print_stages --datadir /mnt/data/eth_erigon/core --chain mainnet
INFO[12-15|23:53:32.297] logging to file system log dir=/mnt/data/eth_erigon/core/logs file prefix=integration log level=info json=false
INFO[12-15|23:53:32.298] [db] chaindata sizeLimit=2TB pageSize=4096
INFO[12-15|23:53:35.581] [snapshots] Blocks Stat blocks=18518k indices=18518k alloc=2.2GB sys=2.3GB
INFO[12-15|23:53:35.584] [bor snapshots] Blocks Stat blocks=0k indices=0k alloc=2.2GB sys=2.3GB
Note: prune_at doesn't mean 'all data before were deleted' - it just mean stage.Prune function were run to this block. Because 1 stage may prune multiple data types to different prune distance.
stage_at prune_at
Snapshots 18608589 0
Headers 18794467 0
BorHeimdall 0 0
BlockHashes 18794467 0
Bodies 18608589 0
Senders 18608589 0
Execution 18608589 18608589
Translation 0 0
HashState 18608589 0
IntermediateHashes 18608589 18608589
AccountHistoryIndex 18608589 0
StorageHistoryIndex 18608589 0
LogIndex 18608589 0
CallTraces 18608589 18608589
TxLookup 18608589 18518000
Finish 18608589 0
--
prune distance:
blocks.v2: true, blocks=18517999, segments=18517999, indices=18517999
history.v3: false, idx steps: 0.00, lastMaxTxNum=0->0, lastBlockInSnap=0
sequence: EthTx=2199760036, NonCanonicalTx=1870330
in db: first header 18518000, last header 18794904, first body 18518000, last body 18794904
what's the solution to this?
Also seeing this recently. Tried updating, but no success. Currently on the latest v2.56.2, using external Lighthouse CL.
[INFO] [01-15|08:56:50.254] [NewPayload] Handling new payload height=19011236 hash=0xd7ff5431c668a10b655270ccec26a6b1dd0a1bb6731033fdbc1a706447f18150
[WARN] [01-15|08:56:50.752] [rpc] served conn=127.0.0.1:35290 method=engine_forkchoiceUpdatedV2 reqid=1 t=156.596579ms err="unexpected chain cap: 19010820"
I just upgraded one of my nodes from v2.46.1 to v2.56.2 and got the same error "unexpected chain cap" above. I believe the cause is Erigon was being updated while the CL is running. I had to restore Erigon's data from another node, and followed this upgrade procedure instead:
- stop the CL
- upgrade Erigon (check logs and wait until everything has settled down)
- start CL
This worked for me, and now Erigon's synced up to head again. Hope this helps someone else.
I had tried that (CL and Erigon stopped) and still encountered this error.
Did you wait for Erigon to fully settle down? I waited for it to complete the snapshot phase, which took over a day.
From integration print_stages, I can see the below. Previously with the error, all stages were at 19036913 without progress. But with the procedure above, headers and exec have moved forward (although snapshots stage is still at previous value):
stage_at prune_at
Snapshots 19036913 0
Headers 19036947 0
BorHeimdall 0 0
BlockHashes 19036947 0
Bodies 19036947 0
Senders 19036947 0
Execution 19036947 19036947
Translation 0 0
HashState 19036947 0
IntermediateHashes 19036947 19036947
AccountHistoryIndex 19036947 0
StorageHistoryIndex 19036947 0
LogIndex 19036947 0
CallTraces 19036947 19036947
TxLookup 19036947 18946000
Finish 19036947 0
Just an update on the above, the snapshot stage eventually kicked in and caught up:
stage_at prune_at
Snapshots 19058024 0
Headers 19058025 0
BorHeimdall 0 0
BlockHashes 19058025 0
Bodies 19058025 0
Senders 19058025 0
Execution 19058025 19058025
Translation 0 0
HashState 19058025 0
IntermediateHashes 19058025 19058025
AccountHistoryIndex 19058025 0
StorageHistoryIndex 19058025 0
LogIndex 19058025 0
CallTraces 19058025 19058025
TxLookup 19058025 18967160
Finish 19058025 0
--
Resolution to this for me was to start with --db.size.limit=6000000000000
How do you get this output info?
Just an update on the above, the snapshot stage eventually kicked in and caught up:
stage_at prune_at Snapshots 19058024 0 Headers 19058025 0 BorHeimdall 0 0 BlockHashes 19058025 0 Bodies 19058025 0 Senders 19058025 0 Execution 19058025 19058025 Translation 0 0 HashState 19058025 0 IntermediateHashes 19058025 19058025 AccountHistoryIndex 19058025 0 StorageHistoryIndex 19058025 0 LogIndex 19058025 0 CallTraces 19058025 19058025 TxLookup 19058025 18967160 Finish 19058025 0 --
@bretep I believe --db.size.limit is another issue, ie breaking change introduced in v2.55.0:
[v2.55.0](https://github.com/ledgerwatch/erigon/releases/tag/v2.55.0)
Breaking Change
Existing nodes whose MDBX page size equals 4kb must add --db.size.limit=8TB flag. Otherwise you will get MDBX_TOO_LARGE error. To check the current page size you can use make db-tools && ./build/bin/mdbx_stat datadir/chaindata
@wwhchung as stated above, you can use the integration print_stages command to get the output, eg:
./build/bin/integration print_stages --datadir /root/.ethereum --chain mainnet
+1
+1
same issue, i tried state_stages unwind=100, log shows
INFO[03-12|01:51:07.379] [snapshots:download] Blocks Stat blocks=19305k indices=19305k alloc=3.3GB sys=4.4GB
DBUG[03-12|01:51:07.379] [1/12 Snapshots] DONE in=2.138136723s
DBUG[03-12|01:51:07.379] [2/12 BlockHashes] DONE in=18.278µs
DBUG[03-12|01:51:07.379] [3/12 Senders] DONE in=100.382µs
INFO[03-12|01:51:07.379] [4/12 Execution] Blocks execution from=19395808 to=19395908
DBUG[03-12|01:51:14.673] [snapshots.webseed] get from HTTP provider err="webseed.http: Get "https://erigon3-v1-snapshots-mainnet.erigon.network/manifest.txt": tls: failed to verify certificate: x509: certificate signed by unknown authority, host=erigon3-v1-snapshots-mainnet.erigon.network, url=" url=
DBUG[03-12|01:51:15.230] [snapshots.webseed] get from HTTP provider err="webseed.http: Get "https://erigon3-v3-snapshots-mainnet.erigon.network/manifest.txt": tls: failed to verify certificate: x509: certificate signed by unknown authority, host=erigon3-v3-snapshots-mainnet.erigon.network, url=" url=
INFO[03-12|01:51:19.427] [4/12 Execution] Completed on block=19395908
DBUG[03-12|01:51:19.427] [4/12 Execution] DONE in=12.048029851s
INFO[03-12|01:51:19.427] [5/12 HashState] Promoting plain state from=19395808 to=19395908
INFO[03-12|01:51:19.427] [5/12 HashState] Incremental promotion from=19395808 to=19395908 codes=true csbucket=AccountChangeSet
INFO[03-12|01:51:19.542] [5/12 HashState] Incremental promotion from=19395808 to=19395908 codes=false csbucket=AccountChangeSet
INFO[03-12|01:51:19.709] [5/12 HashState] Incremental promotion from=19395808 to=19395908 codes=false csbucket=StorageChangeSet
DBUG[03-12|01:51:20.413] [5/12 HashState] DONE in=985.822563ms
INFO[03-12|01:51:20.413] [6/12 IntermediateHashes] Generating intermediate hashes from=19395808 to=19395908
DBUG[03-12|01:51:20.660] [txpool] Commit written_kb=0 in=3.927648ms
DBUG[03-12|01:51:22.959] [6/12 IntermediateHashes] DONE in=2.546475506s
DBUG[03-12|01:51:23.545] [7/12 CallTraces] DONE in=585.147029ms
DBUG[03-12|01:51:23.916] [8/12 AccountHistoryIndex] DONE in=371.681863ms
DBUG[03-12|01:51:24.583] [9/12 StorageHistoryIndex] DONE in=667.066738ms
DBUG[03-12|01:51:24.902] [snapshots] no metadata yet files=9 list=v1-018400-018500-bodies.seg,v1-018400-018500-headers.seg,v1-018300-018400-bodies.seg,v1-018300-018400-transactions.seg,v1-018300-018400-headers.seg,...
DBUG[03-12|01:51:25.183] [10/12 LogIndex] DONE in=599.86347ms
DBUG[03-12|01:51:25.569] [11/12 TxLookup] DONE in=385.539293ms
DBUG[03-12|01:51:25.572] [12/12 Finish] DONE in=3.554721ms
INFO[03-12|01:51:26.001] [snapshots] Prune Blocks to=19305000 limit=100
DBUG[03-12|01:51:36.091] [txpool] Commit written_kb=0 in=10.929402ms
DBUG[03-12|01:51:38.859] [NewPayload] sending block height=19415836 hash=0x374a0620eef1627420b8a5e2c43e6fd41f02a4e281546c4ad09987f354fd3729
INFO[03-12|01:51:38.859] [NewPayload] Handling new payload height=19415836 hash=0x374a0620eef1627420b8a5e2c43e6fd41f02a4e281546c4ad09987f354fd3729
DBUG[03-12|01:51:38.859] [NewPayload] New payload: need to download parent height=19415836 hash=0x374a0620eef1627420b8a5e2c43e6fd41f02a4e281546c4ad09987f354fd3729 parentHash=0xec6a48eaa55491ab9d87749ae9dccf22f1a6c90f9c763fb685b99ffd6ae4e29c
INFO[03-12|01:51:38.859] [EngineBlockDownloader] Downloading PoS headers... hash=0xec6a48eaa55491ab9d87749ae9dccf22f1a6c90f9c763fb685b99ffd6ae4e29c requestId=0
DBUG[03-12|01:51:38.859] [downloader] Set posAnchor blockHeight=1
DBUG[03-12|01:51:39.918] [NewPayload] got reply payloadStatus="&{Status:SYNCING ValidationError:
Did you wait for Erigon to fully settle down? I waited for it to complete the snapshot phase, which took over a day.
That did it. Thanks. Upgrade successful.
Please re-open this issue if it still happens with v2.59.2 or later.