erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Metric `sync{stage="headers"}` doesn't move forward

Open p2p-sergey-y opened this issue 2 years ago • 8 comments

System information

erigon:~$ docker image ls
REPOSITORY                 TAG           IMAGE ID       CREATED        SIZE
thorax/erigon              v2022.07.04   70949dd09afd   5 days ago     640MB

erigon:~$ docker exec -it erigon_node erigon --version
erigon version 2022.07.4-alpha-d88fc824

OS & Version: Linux (Ubuntu)

Expected behavior

I have Erigon + Lighthouse setup in Ropsten:

  • Lighthouse: v2.5.0
  • Erigon: 2022.07.04

All run in docker.

Actual behavior

After upgrade to 2022.07.03 (version 2022.07.04 has the same issue) I found out, that sync{stage="headers"} doesn't move forward/increase, it stuck in the same value until reload. After reload, it becomes the last one and stuck again: Before erigon restart:

erigon:~$ curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 12710452
sync{stage="finish"} 12710452
sync{stage="headers"} 12703876

After erigon restart:

erigon:~$ curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 12710487
sync{stage="finish"} 12710487
sync{stage="headers"} 12710482

After ~ 5 munutes:

erigon:~$ curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 12710556
sync{stage="finish"} 12710556
sync{stage="headers"} 12710482

Debug logs can be found here.

p2p-sergey-y avatar Aug 02 '22 18:08 p2p-sergey-y

I think it just executing this batch of blocks (didn’t see logs yet - i’m on mobile)

AskAlexSharov avatar Aug 03 '22 01:08 AskAlexSharov

After 14 hours of working I got this:

erigon:~$curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 12713494
sync{stage="finish"} 12713494
sync{stage="headers"} 12710592

So, it moves forward but really slowly. Before upgrade to 2022.07.03 it was fine. Is it a problem?

ghost avatar Aug 03 '22 06:08 ghost

I don’t know. Show logs. Share your disk hardware type.

AskAlexSharov avatar Aug 03 '22 09:08 AskAlexSharov

Logs can be downloaded by link in my first message. VM is GCP VM, disk is just SSD persistent. Iostat shows that we haven't any big io pressure on disk:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.97    0.00    2.99    0.00    0.00   90.05

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00   43.00    212.00     1.00   2.27    0.88     4.93    0.00      0.00     0.00   0.00    0.00     0.00    0.04   0.80
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.50    0.00    5.25    0.50    0.00   89.75

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    6.00   1064.00    13.00  68.42    1.50   177.33    0.00      0.00     0.00   0.00    0.00     0.00    0.01   0.80
sdb             48.00    192.00     0.00   0.00    0.52     4.00    2.00      4.00     0.00   0.00    0.50     2.00    0.00      0.00     0.00   0.00    0.00     0.00    0.03   4.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.68    0.00    4.95    1.24    0.25   86.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    8.00    232.00     9.00  52.94    0.62    29.00    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.20
sdb             86.00    344.00     0.00   0.00    0.43     4.00  136.00   2824.00     2.00   1.45    2.04    20.76    0.00      0.00     0.00   0.00    0.00     0.00    0.32   7.20

ghost avatar Aug 03 '22 20:08 ghost

Hello! After merge in prater/goerli I see the same issue with current erigon nodes without any restart: before merge these metrics was the same and move forward simultaneously, but after merge sync{stage="headers"} starts to move forward really slowly.

ghost avatar Aug 11 '22 11:08 ghost

show logs

AskAlexSharov avatar Aug 11 '22 13:08 AskAlexSharov

Ok, I uploaded logs right here, but it was also available by a link in my first message too. erigon.log

ghost avatar Aug 11 '22 19:08 ghost

@p2p-sergey-y sorry, i don't see any problems in your logs. it processing blocks 1-by-1, fast enough. what means "really slowly"?

AskAlexSharov avatar Aug 12 '22 05:08 AskAlexSharov

have the same issue after merge on mainnet.

it syncs after restart but then again pauses

# curl -s localhost:6060/debug/metrics/prometheus | grep stage
sync{stage="execution"} 15540244
sync{stage="finish"} 15540244
sync{stage="headers"} 15540212

erigon-2022.09.1-alpha version 2022.09.1-alpha-4067b7c4 Lighthouse v3.1.0-aa022f4

ExecStart=/home/erigon/bin/erigon-2022.09.1-alpha \
    --datadir mainnet --private.api.addr=127.0.0.1:9090 \
    --torrent.download.rate=100mb --chain=mainnet --port=30303 \
    --http.port=8545 --torrent.port=42069 \
    --http --http.addr=127.0.0.1 --ws --http.api=eth,debug,net,trace,web3,txpool,erigon \
    --metrics --metrics.addr="0.0.0.0"  \
    --authrpc.jwtsecret="/home/erigon/mainnet/jwt.hex" 

erigon.log

AGx10k avatar Sep 15 '22 16:09 AGx10k

@p2p-sergey-y sorry, i don't see any problems in your logs. it processing blocks 1-by-1, fast enough. what means "really slowly"?

Sometimes I found that sync{stage="headers"} increases slowly, for example, I saw that in 10 minutes in could move forward in 10 slots. But in the last release (2022.09.02) i see that it doesn't move at all:

erigon:~# date
Mon Sep 19 07:09:19 UTC 2022
erigon:~# curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 13009859
sync{stage="finish"} 13009859
sync{stage="headers"} 13009803
erigon:~# date
Mon Sep 19 07:15:55 UTC 2022
erigon:~# curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 13009885
sync{stage="finish"} 13009885
sync{stage="headers"} 13009803
erigon:~# date
Mon Sep 19 07:22:35 UTC 2022
erigon:~# curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 13009912
sync{stage="finish"} 13009912
sync{stage="headers"} 13009803

ghost avatar Sep 19 '22 07:09 ghost

Oh, no, it's moving ahead by a small steps

erigon:~# curl -s ***:***/debug/metrics/prometheus | grep -E 'sync{'
sync{stage="execution"} 13010453
sync{stage="finish"} 13010453
sync{stage="headers"} 13010383

ghost avatar Sep 19 '22 09:09 ghost

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] avatar Dec 09 '22 02:12 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Dec 17 '22 02:12 github-actions[bot]