Losing Sync Since March 13
My node on ETH Mainnet has been getting out of sync repeatedly since March 13. The main unique thing about my setup is that snapshots are on an external SSD (symlinked) to save space on the NVMe.
System information
Erigon version: ./erigon --version
erigon version 2.58.2-125509e4
Updated to 2.59.0 and still losing sync
OS & Version: Windows/Linux/OSX
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
Commit hash:
Erigon Command (with flags/config):
[Unit]
Description=Erigon Execution Layer Client service
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=dylan
Restart=on-failure
RestartSec=3
KillSignal=SIGINT
TimeoutStopSec=300
ExecStart=/home/dylan/erigon/build/bin/erigon \
--datadir /var/lib/erigon \
--chain mainnet \
--db.size.limit=8TB \
--metrics \
--pprof \
--prune htc \
--authrpc.jwtsecret=/secrets/jwtsecret \
--http \
--http.addr 192.168.0.21 \
--http.api eth,net,web3
[Install]
WantedBy=multi-user.target
Consensus Layer:
Lighthouse v5.1.1-2a3c709
BLS library: blst-modern
SHA256 hardware acceleration: false
Allocator: jemalloc
Profile: maxperf
Specs: mainnet (true), minimal (false), gnosis (true)
Consensus Layer Command (with flags/config):
[Unit]
Description=eth beacon chain service
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=dylan
Restart=on-failure
ExecStart=/usr/bin/lighthouse bn \
--network mainnet \
--staking \
--validator-monitor-auto \
--metrics \
--suggested-fee-recipient <redacted> \
--checkpoint-sync-url=https://beaconstate.info \
--execution-endpoint http://127.0.0.1:8551 \
--execution-jwt /secrets/jwtsecret \
--monitoring-endpoint https://beaconcha.in/api/v1/client/metrics?apikey=<redacted>&machine=Nuc
[Install]
WantedBy=multi-user.target
Chain/Network: ETH Mainnet
Expected behaviour
Stay synced.
Actual behaviour
Losing sync often
Steps to reproduce the behaviour
Backtrace
[backtrace]
Let me know what else I could grep the logs for to help diagnose this issue. I didn't see warning or error.
One thing I've noticed is that the execution phase seems very slow:
Mar 19 21:39:02 nuc erigon[986]: [INFO] [03-19|21:39:02.819] [4/12 Execution] Executed blocks number=19470961 blk/s=0.1 tx/s=27.7 Mgas/s=2.0 gasState=0.00 batch=861.6KB alloc=3.6GB sys=8.0GB
Mar 19 21:39:41 nuc erigon[986]: [INFO] [03-19|21:39:41.942] [4/12 Execution] Executed blocks number=19470965 blk/s=0.1 tx/s=19.4 Mgas/s=2.1 gasState=0.00 batch=1012.5KB alloc=2.9GB sys=8.0GB
Mar 19 21:40:05 nuc erigon[986]: [INFO] [03-19|21:40:05.355] [4/12 Execution] Executed blocks number=19470969 blk/s=0.2 tx/s=28.5 Mgas/s=2.2 gasState=0.00 batch=1.1MB alloc=3.1GB sys=8.0GB
Ah, here are some errors:
Mar 19 15:19:28 nuc erigon[986]: [WARN] [03-19|15:19:28.865] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 15:25:57 nuc erigon[986]: [WARN] [03-19|15:25:57.855] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 15:41:04 nuc erigon[986]: [WARN] [03-19|15:41:04.040] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 16:00:14 nuc erigon[986]: [WARN] [03-19|16:00:11.640] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 16:14:39 nuc erigon[986]: [WARN] [03-19|16:14:39.917] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 16:30:22 nuc erigon[986]: [WARN] [03-19|16:30:22.702] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 16:39:22 nuc erigon[986]: [WARN] [03-19|16:39:22.187] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 17:03:45 nuc erigon[986]: [WARN] [03-19|17:03:45.462] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 17:57:16 nuc erigon[986]: [WARN] [03-19|17:57:16.289] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 18:45:31 nuc erigon[986]: [WARN] [03-19|18:45:31.304] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 18:56:29 nuc erigon[986]: [WARN] [03-19|18:56:29.358] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 19:17:38 nuc erigon[986]: [WARN] [03-19|19:17:38.316] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 19 20:50:50 nuc erigon[986]: [WARN] [03-19|20:50:50.687] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 20 01:06:51 nuc erigon[986]: [WARN] [03-20|01:06:51.827] [fetch] onNewBlock err="failed MdbxKV cursor.Seek(): mdbx_cursor_get: MDBX_BAD_VALSIZE: Invalid size or alignment of key or data for target database, either invalid subDB name, bucket: PoolTransaction, key: ed80609b55bbb9a63618c9c6da1ed8d31d0a49962a98f27d69226dc9a528bcfa"
Mar 20 01:09:54 nuc erigon[986]: [WARN] [03-20|01:09:54.820] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
- Remove datadir/txpool
- Add —snap.stop until next erigon’s upgrade
- Remove datadir/txpool
- Add —snap.stop until next erigon’s upgrade
Thanks ended up upgrading to 2.59 and all good so far. Will reopen if not.
Reopening because I’m still losing sync on 2.59.0. Will post logs tonight.
Mar 21 19:01:53 nuc erigon[441457]: [WARN] [03-21|19:01:23.623] [engine] GetHeaderByHash err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:01:53 nuc erigon[441457]: [WARN] [03-21|19:01:23.623] [engine] GetHeaderByHash err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:01:53 nuc erigon[441457]: [WARN] [03-21|19:01:23.623] [engine] GetTd err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:01:53 nuc erigon[441457]: [WARN] [03-21|19:01:23.623] [engine] GetTd err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:01:53 nuc erigon[441457]: [WARN] [03-21|19:01:23.623] [engine] CurrentHeader err="ethereumExecutionModule.CurrentHeader: could not open database: context canceled"
Mar 21 19:01:53 nuc erigon[441457]: [WARN] [03-21|19:01:23.623] [engine] CurrentHeader err="ethereumExecutionModule.CurrentHeader: could not open database: context canceled"
Mar 21 19:32:56 nuc erigon[441457]: [WARN] [03-21|19:32:37.412] [engine] GetHeaderByHash err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:32:56 nuc erigon[441457]: [WARN] [03-21|19:32:37.412] [engine] GetTd err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:32:56 nuc erigon[441457]: [WARN] [03-21|19:32:37.412] [engine] CurrentHeader err="ethereumExecutionModule.CurrentHeader: could not open database: context canceled"
Mar 21 19:35:21 nuc erigon[441457]: [WARN] [03-21|19:34:58.287] [engine] GetHeaderByHash err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:35:21 nuc erigon[441457]: [WARN] [03-21|19:35:21.201] [engine] GetTd err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 21 19:35:21 nuc erigon[441457]: [WARN] [03-21|19:35:21.201] [engine] CurrentHeader err="ethereumExecutionModule.CurrentHeader: could not open database: context canceled"
hey could try to use --internalcl, just to see if it is an engine API specific issue? CC @somnathb1
@somnathb1 Suggested on discord:
Could you also (in this order) check the disk speed/performance in terms of access times - for all symlinks and datadir check the above again after running erigon try removing any competing process accessing the datadir (such as another erigon process, or an integration/mdbx command line) try removing $datadir/chaindata/mdbx.lck and restart the erigon process/container
Here are fio results testing snapshots (symlinked) and datadir while Erigon is stopped and a few minutes after starting it.
Snapshots before starting Erigon:
fio --name=Test --rw=read --size=500M --directory=/var/lib/erigon/snapshots
Test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Test: Laying out IO file (1 file / 500MiB)
Test: (groupid=0, jobs=1): err= 0: pid=584985: Fri Mar 22 17:38:03 2024
read: IOPS=151k, BW=590MiB/s (618MB/s)(500MiB/848msec)
clat (nsec): min=570, max=838099, avg=6410.46, stdev=45685.29
lat (nsec): min=595, max=838122, avg=6437.37, stdev=45685.45
clat percentiles (nsec):
| 1.00th=[ 612], 5.00th=[ 636], 10.00th=[ 644], 20.00th=[ 660],
| 30.00th=[ 676], 40.00th=[ 684], 50.00th=[ 700], 60.00th=[ 716],
| 70.00th=[ 740], 80.00th=[ 788], 90.00th=[ 892], 95.00th=[ 1020],
| 99.00th=[329728], 99.50th=[391168], 99.90th=[473088], 99.95th=[497664],
| 99.99th=[602112]
bw ( KiB/s): min=601568, max=601568, per=99.63%, avg=601568.00, stdev= 0.00, samples=1
iops : min=150392, max=150392, avg=150392.00, stdev= 0.00, samples=1
lat (nsec) : 750=71.79%, 1000=22.81%
lat (usec) : 2=2.29%, 4=1.52%, 10=0.03%, 20=0.01%, 50=0.01%
lat (usec) : 100=0.01%, 250=0.04%, 500=1.48%, 750=0.05%, 1000=0.01%
cpu : usr=5.08%, sys=23.61%, ctx=1999, majf=0, minf=12
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=128000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=590MiB/s (618MB/s), 590MiB/s-590MiB/s (618MB/s-618MB/s), io=500MiB (524MB), run=848-848msec
Disk stats (read/write):
sda: ios=1707/0, merge=0/0, ticks=1260/0, in_queue=0, util=88.46%
Datadir before starting Erigon:
fio --name=Test --rw=read --size=500M --directory=/var/lib/erigon
Test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Test: Laying out IO file (1 file / 500MiB)
Test: (groupid=0, jobs=1): err= 0: pid=585044: Fri Mar 22 17:41:01 2024
read: IOPS=204k, BW=796MiB/s (835MB/s)(500MiB/628msec)
clat (nsec): min=563, max=951461, avg=4532.92, stdev=31085.75
lat (nsec): min=587, max=951488, avg=4580.54, stdev=31086.00
clat percentiles (nsec):
| 1.00th=[ 604], 5.00th=[ 636], 10.00th=[ 668], 20.00th=[ 724],
| 30.00th=[ 764], 40.00th=[ 804], 50.00th=[ 868], 60.00th=[ 1020],
| 70.00th=[ 1160], 80.00th=[ 1448], 90.00th=[ 2008], 95.00th=[ 2192],
| 99.00th=[171008], 99.50th=[261120], 99.90th=[444416], 99.95th=[473088],
| 99.99th=[514048]
bw ( KiB/s): min=808432, max=808432, per=99.16%, avg=808432.00, stdev= 0.00, samples=1
iops : min=202108, max=202108, avg=202108.00, stdev= 0.00, samples=1
lat (nsec) : 750=25.47%, 1000=33.82%
lat (usec) : 2=30.21%, 4=8.71%, 10=0.16%, 20=0.04%, 50=0.08%
lat (usec) : 100=0.22%, 250=0.73%, 500=0.55%, 750=0.02%, 1000=0.01%
cpu : usr=11.00%, sys=47.53%, ctx=1531, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=128000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=796MiB/s (835MB/s), 796MiB/s-796MiB/s (835MB/s-835MB/s), io=500MiB (524MB), run=628-628msec
Disk stats (read/write):
dm-0: ios=1982/72, merge=0/0, ticks=852/0, in_queue=852, util=86.59%, aggrios=2009/68, aggrmerge=0/4, aggrticks=871/35, aggrin_queue=0, aggrutil=84.61%
nvme0n1: ios=2009/68, merge=0/4, ticks=871/35, in_queue=0, util=84.61%
Snapshots after starting Erigon:
fio --name=Test --rw=read --size=500M --directory=/var/lib/erigon/snapshots
Test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=69.0MiB/s][r=17.7k IOPS][eta 00m:00s]
Test: (groupid=0, jobs=1): err= 0: pid=585143: Fri Mar 22 17:44:06 2024
read: IOPS=16.9k, BW=66.1MiB/s (69.3MB/s)(500MiB/7564msec)
clat (nsec): min=462, max=191793k, avg=58581.45, stdev=773479.15
lat (nsec): min=482, max=191793k, avg=58607.67, stdev=773479.12
clat percentiles (nsec):
| 1.00th=[ 548], 5.00th=[ 588], 10.00th=[ 604],
| 20.00th=[ 628], 30.00th=[ 644], 40.00th=[ 668],
| 50.00th=[ 692], 60.00th=[ 724], 70.00th=[ 756],
| 80.00th=[ 804], 90.00th=[ 900], 95.00th=[ 1080],
| 99.00th=[ 1810432], 99.50th=[ 5406720], 99.90th=[ 7438336],
| 99.95th=[ 8159232], 99.99th=[11075584]
bw ( KiB/s): min=42464, max=73216, per=98.63%, avg=66762.67, stdev=10164.32, samples=15
iops : min=10616, max=18304, avg=16690.67, stdev=2541.08, samples=15
lat (nsec) : 500=0.09%, 750=67.39%, 1000=26.13%
lat (usec) : 2=4.14%, 4=0.58%, 10=0.09%, 20=0.01%, 50=0.02%
lat (usec) : 100=0.05%, 250=0.06%, 500=0.10%, 750=0.06%, 1000=0.08%
lat (msec) : 2=0.23%, 4=0.23%, 10=0.72%, 20=0.01%, 250=0.01%
cpu : usr=0.70%, sys=5.66%, ctx=1918, majf=0, minf=12
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=128000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=66.1MiB/s (69.3MB/s), 66.1MiB/s-66.1MiB/s (69.3MB/s-69.3MB/s), io=500MiB (524MB), run=7564-7564msec
Disk stats (read/write):
sda: ios=23086/8, merge=0/24, ticks=166931/35, in_queue=114592, util=97.68%
Datadir after starting Erigon:
fio --name=Test --rw=read --size=500M --directory=/var/lib/erigon
Test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1)
Test: (groupid=0, jobs=1): err= 0: pid=585208: Fri Mar 22 17:46:07 2024
read: IOPS=116k, BW=453MiB/s (475MB/s)(500MiB/1104msec)
clat (nsec): min=453, max=12097k, avg=7512.28, stdev=99586.78
lat (nsec): min=475, max=12097k, avg=7611.07, stdev=99587.34
clat percentiles (nsec):
| 1.00th=[ 532], 5.00th=[ 572], 10.00th=[ 636],
| 20.00th=[ 948], 30.00th=[ 1144], 40.00th=[ 1672],
| 50.00th=[ 1992], 60.00th=[ 2192], 70.00th=[ 2288],
| 80.00th=[ 2384], 90.00th=[ 2576], 95.00th=[ 2768],
| 99.00th=[ 214016], 99.50th=[ 257024], 99.90th=[ 888832],
| 99.95th=[1597440], 99.99th=[3751936]
bw ( KiB/s): min=406496, max=467400, per=94.22%, avg=436948.00, stdev=43065.63, samples=2
iops : min=101624, max=116850, avg=109237.00, stdev=10766.41, samples=2
lat (nsec) : 500=0.14%, 750=13.74%, 1000=7.99%
lat (usec) : 2=28.14%, 4=48.05%, 10=0.26%, 20=0.08%, 50=0.05%
lat (usec) : 100=0.11%, 250=0.88%, 500=0.41%, 750=0.03%, 1000=0.03%
lat (msec) : 2=0.05%, 4=0.03%, 10=0.01%, 20=0.01%
cpu : usr=12.06%, sys=52.49%, ctx=1192, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=128000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=453MiB/s (475MB/s), 453MiB/s-453MiB/s (475MB/s-475MB/s), io=500MiB (524MB), run=1104-1104msec
Disk stats (read/write):
dm-0: ios=1406/1474, merge=0/0, ticks=1012/4020, in_queue=5032, util=99.29%, aggrios=2075/2099, aggrmerge=0/67, aggrticks=1310/9904, aggrin_queue=8056, aggrutil=99.59%
nvme0n1: ios=2075/2099, merge=0/67, ticks=1310/9904, in_queue=8056, util=99.59%
I went back to the time that validation actually started failing to look at logs and the errors right before failure were DumpBlocks: DumpBodies: header missed in db:
Mar 20 14:34:25 nuc erigon[986]: [INFO] [03-20|14:34:25.952] head updated hash=0xda49c626629e22ea4181b0990f0d4fc31e21f8cc574dd4d3f08467b466997d98 number=19476536
Mar 20 14:34:26 nuc erigon[986]: [INFO] [03-20|14:34:26.018] [snapshots] Prune Blocks to=18500000 limit=100
Mar 20 14:34:26 nuc erigon[986]: [WARN] [03-20|14:34:26.019] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 20 14:34:37 nuc erigon[986]: [INFO] [03-20|14:34:37.741] [NewPayload] Handling new payload height=19476537 hash=0x37a1c155eaa2d740f2103c42a2465f20fb8be10f11f831ee46885b6378f96fbb
Mar 20 14:34:38 nuc erigon[986]: [INFO] [03-20|14:34:38.363] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
Mar 20 14:34:38 nuc erigon[986]: [INFO] [03-20|14:34:38.658] RPC Daemon notified of new headers from=19476536 to=19476537 hash=0x37a1c155eaa2d740f2103c42a2465f20fb8be10f11f831ee46885b6378f96fbb header sending=13.673µs log sending=201ns
Mar 20 14:34:38 nuc erigon[986]: [INFO] [03-20|14:34:38.658] head updated hash=0x37a1c155eaa2d740f2103c42a2465f20fb8be10f11f831ee46885b6378f96fbb number=19476537
Mar 20 14:34:38 nuc erigon[986]: [INFO] [03-20|14:34:38.748] [snapshots] Prune Blocks to=18500000 limit=100
Mar 20 14:34:38 nuc erigon[986]: [WARN] [03-20|14:34:38.748] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
Mar 20 14:34:48 nuc erigon[986]: [INFO] [03-20|14:34:48.675] [NewPayload] Handling new payload height=19476538 hash=0xb5c6395dd860ec4b8ef314dae67cee46b322bb94e50dbd77a84aad17944de777
Mar 20 14:34:49 nuc erigon[986]: [INFO] [03-20|14:34:49.355] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
Mar 20 14:34:49 nuc erigon[986]: [INFO] [03-20|14:34:49.643] RPC Daemon notified of new headers from=19476537 to=19476538 hash=0xb5c6395dd860ec4b8ef314dae67cee46b322bb94e50dbd77a84aad17944de777 header sending=14.43µs log sending=216ns
Mar 20 14:34:49 nuc erigon[986]: [INFO] [03-20|14:34:49.643] head updated hash=0xb5c6395dd860ec4b8ef314dae67cee46b322bb94e50dbd77a84aad17944de777 number=19476538
Mar 20 14:34:49 nuc erigon[986]: [INFO] [03-20|14:34:49.727] [snapshots] Prune Blocks to=18500000 limit=100
Mar 20 14:34:49 nuc erigon[986]: [WARN] [03-20|14:34:49.728] [snapshots] retire blocks err="DumpBlocks: DumpBodies: header missed in db: block_num=18500000, hash=80ce6bb0e244fbdf66cf0a1108273fe1ca58788efb7fb8d3a0d783d2b06d433d"
So there might be multiple issues.
Getting this again several hours after restart. Never finished syncing:
Mar 23 00:43:52 nuc erigon[585070]: [INFO] [03-23|00:42:47.572] [6/12 IntermediateHashes] Calculating Merkle root current key=d65e0322
Mar 23 00:43:52 nuc erigon[585070]: [INFO] [03-23|00:43:17.588] [6/12 IntermediateHashes] Calculating Merkle root current key=d66d67ab
Mar 23 00:43:52 nuc erigon[585070]: [INFO] [03-23|00:43:47.604] [6/12 IntermediateHashes] Calculating Merkle root current key=d67f8020
Mar 23 00:43:54 nuc erigon[585070]: [INFO] [03-23|00:43:54.232] [p2p] GoodPeers eth68=45 eth67=1
Mar 23 00:43:56 nuc erigon[585070]: [INFO] [03-23|00:43:56.359] [mem] memory stats Rss=30.6MB Size=0B Pss=30.6MB SharedClean=14.8KB SharedDirty=0B PrivateClean=25.4MB PrivateDirty=5.2MB Refer>
Mar 23 00:44:02 nuc erigon[585070]: [INFO] [03-23|00:44:02.013] [txpool] stat pending=0 baseFee=0 queued=30000 alloc=3.7GB sys=8.3GB
Mar 23 00:44:29 nuc erigon[585070]: [INFO] [03-23|00:44:18.321] [6/12 IntermediateHashes] Calculating Merkle root current key=d68f3ce9
Mar 23 00:44:47 nuc erigon[585070]: [INFO] [03-23|00:44:47.536] [6/12 IntermediateHashes] Calculating Merkle root current key=d697de19
Mar 23 00:45:18 nuc erigon[585070]: [INFO] [03-23|00:45:18.052] [6/12 IntermediateHashes] Calculating Merkle root current key=d6a597f2
Mar 23 00:46:02 nuc erigon[585070]: [INFO] [03-23|00:45:47.567] [6/12 IntermediateHashes] Calculating Merkle root current key=d6b5b4dd
Mar 23 00:46:12 nuc erigon[585070]: [WARN] [03-23|00:46:11.982] [engine] GetHeaderByHash err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 23 00:46:12 nuc erigon[585070]: [WARN] [03-23|00:46:12.281] [engine] GetTd err="ethereumExecutionModule.GetHeader: could not open database: context canceled"
Mar 23 00:46:12 nuc erigon[585070]: [WARN] [03-23|00:46:12.281] [engine] CurrentHeader err="ethereumExecutionModule.CurrentHeader: could not open database: context canceled"
Upgraded to:
Erigon v2.59.3Lighthouse v5.1.3-3058b96
Lighthouse hotfix had a note about forkchoice issues and I feel like I saw this in my logs so hopefully that will fix at least part of this issue.
🤞 🤞 🤞
@yperbasis happy to see the label. Let me know if I can provide more logs or do any debugging. I've had about 50% attestations vs missed in the last month 😢
I'm still unable to stay in sync. Here are stage times from my most recent attempt:
Jun 04 22:34:51 nuc erigon[87450]: [INFO] [06-04|22:34:51.048] [1/12 Snapshots] DONE in=6m10.611197956s
Jun 04 22:43:36 nuc erigon[87450]: [INFO] [06-04|22:43:36.394] [3/12 Senders] DONE in=3m32.91207169s
Jun 05 02:53:39 nuc erigon[87450]: [INFO] [06-05|02:53:39.319] [4/12 Execution] DONE in=4h10m2.924591551s
Jun 05 07:25:06 nuc erigon[87450]: [INFO] [06-05|07:25:06.130] [5/12 HashState] DONE in=4h31m26.810970751s
Jun 05 08:12:15 nuc erigon[87450]: [INFO] [06-05|08:12:15.014] [6/12 IntermediateHashes] DONE in=47m8.883974433s
Jun 05 09:28:53 nuc erigon[87450]: [INFO] [06-05|09:28:53.808] [7/12 CallTraces] DONE in=1h16m38.79209474s
Jun 05 10:23:44 nuc erigon[87450]: [INFO] [06-05|10:23:44.496] [8/12 AccountHistoryIndex] DONE in=54m50.68689461s
Jun 05 11:23:57 nuc erigon[87450]: [INFO] [06-05|11:23:57.046] [9/12 StorageHistoryIndex] DONE in=1h0m12.549762117s
Jun 05 11:59:11 nuc erigon[87450]: [INFO] [06-05|11:59:11.868] [10/12 LogIndex] DONE in=35m14.821925573s
Jun 05 12:26:11 nuc erigon[87450]: [INFO] [06-05|12:26:11.759] [11/12 TxLookup] DONE in=26m59.890847703s
Jun 05 12:28:13 nuc erigon[87450]: [INFO] [06-05|12:28:13.157] [9/12 StorageHistoryIndex] Prune done in=1m59.998599987s
Jun 05 12:50:25 nuc erigon[87450]: [INFO] [06-05|12:50:25.244] [8/12 AccountHistoryIndex] Prune done in=22m12.087567395s
Jun 05 13:17:39 nuc erigon[87450]: [INFO] [06-05|13:17:39.817] [4/12 Execution] DONE in=18m25.532472159s
Jun 05 13:46:02 nuc erigon[87450]: [INFO] [06-05|13:46:02.584] [5/12 HashState] DONE in=28m22.767034598s
Jun 05 14:04:09 nuc erigon[87450]: [INFO] [06-05|14:04:09.089] [6/12 IntermediateHashes] DONE in=18m6.50480355s
Jun 05 14:36:26 nuc erigon[87450]: [INFO] [06-05|14:36:26.094] [7/12 CallTraces] DONE in=32m17.004903009s
Jun 05 14:54:17 nuc erigon[87450]: [INFO] [06-05|14:54:17.458] [8/12 AccountHistoryIndex] DONE in=17m51.364516027s
Jun 05 15:49:33 nuc erigon[87450]: [INFO] [06-05|15:49:33.311] [9/12 StorageHistoryIndex] DONE in=55m15.853033727s
Jun 05 16:29:01 nuc erigon[87450]: [INFO] [06-05|16:29:01.357] [10/12 LogIndex] DONE in=39m28.045902965s
Jun 05 16:34:33 nuc erigon[87450]: [INFO] [06-05|16:34:33.958] [11/12 TxLookup] DONE in=5m32.600785363s
Jun 05 16:35:09 nuc erigon[87450]: [INFO] [06-05|16:35:09.270] [9/12 StorageHistoryIndex] Prune done in=34.612729467s
Jun 05 16:59:06 nuc erigon[87450]: [INFO] [06-05|16:59:06.241] [8/12 AccountHistoryIndex] Prune done in=23m56.970341548s
Jun 05 17:27:44 nuc erigon[87450]: [INFO] [06-05|17:27:44.016] [4/12 Execution] DONE in=21m49.530837755s
Jun 05 17:47:36 nuc erigon[87450]: [INFO] [06-05|17:47:36.035] [5/12 HashState] DONE in=19m52.018822307s
Jun 05 19:40:44 nuc erigon[87450]: [INFO] [06-05|19:40:44.396] [6/12 IntermediateHashes] DONE in=1h53m8.360629727s
@dylancaponi show:
curl http://127.0.0.1:6060/debug/metrics/prometheusdu -hsc datadir/*mdbx_stat -ef datadir/chaindata
Please re-open this issue if it still happens with v2.61.0 or later