bsc
bsc copied to clipboard
Occasional multi-second lags
Hi. I've just synced a full node from a Feb 2 snapshot. It's been running for several hours now, mostly ok, howerver, there are occasional lags or freezes that violate the ~3 sec period between blocks. The logs look something like this:
.....................
t=2022-02-05T13:02:19+0000 lvl=info msg="Imported new chain segment" blocks=1 txs=222 mgas=32.900 elapsed=190.457ms mgasps=172.742 number=14,990,583 hash=0x2b354b6470cb72ca41aec5542b58bcfa02bc7ca24d3b37a34ca34928316c1a48 dirty="749.19 MiB"
t=2022-02-05T13:02:22+0000 lvl=info msg="Imported new chain segment" blocks=1 txs=399 mgas=47.577 elapsed=289.242ms mgasps=164.488 number=14,990,584 hash=0xf084d45206bffd1b2d7d77bebe7b46095094e3e63583a5abd40ec0e52c7f67b0 dirty="751.10 MiB"
t=2022-02-05T13:02:25+0000 lvl=info msg="Imported new chain segment" blocks=1 txs=248 mgas=37.764 elapsed=210.122ms mgasps=179.724 number=14,990,585 hash=0x35f40af3e96b2a7780b416172e46a183250dcf5e1fb32905d0d3b378be356c37 dirty="752.87 MiB"
t=2022-02-05T13:02:28+0000 lvl=info msg="Imported new chain segment" blocks=1 txs=289 mgas=40.968 elapsed=239.823ms mgasps=170.826 number=14,990,586 hash=0xd60d54c652691c785558956ca59496bc16d29ef2c72a03ec072d860b56a6749b dirty="754.69 MiB"
t=2022-02-05T13:02:32+0000 lvl=eror msg="Snapshot extension registration failed" peer=1825f0cc err="peer connected on snap without compatible eth support"
t=2022-02-05T13:02:37+0000 lvl=info msg="Deep froze chain segment" blocks=18 elapsed=8.447ms number=14,900,586 hash=0xeaace8cd9475c0fad16440e39f51fac8d3e92254383ee180d8abd3d8bdc21777
t=2022-02-05T13:02:38+0000 lvl=info msg="Downloader queue stats" receiptTasks=0 blockTasks=0 itemSize="71.10 KiB" throttle=922
t=2022-02-05T13:02:41+0000 lvl=info msg="Imported new chain segment" blocks=1 txs=165 mgas=33.456 elapsed=10.495s mgasps=3.188 number=14,990,587 hash=0x35695ac468ccb6c516d1c4500fdc90af196b020cef00737c496c6ecb1f1278d8 dirty="754.85 MiB"
t=2022-02-05T13:02:41+0000 lvl=info msg="do light process success at block" num=14,990,588
.....................
Has anyone encountered similar problems? How do I solve that? The server has plenty of horsepower, it's a 64 core AMD CPU w/ 64GB and 3.5 TB NVMe
what does this Downloader queue stats
line actually mean?
for me happens consistently in the "Downloader queue stats" log too.
I am not sure that Downloader queue stats
is the actual culprit. It seems more like the result of the problem.
when this happens, the node start lagging several blocks behind.
what could the reason be? @j75689 you've helped on several occasions, perhaps, you could share some insight now?
Can you do profiling when it lags, and attach the profile log here?
Can you do profiling when it lags, and attach the profile log here?
Do you mean running the node with --prpof
?
@KeefeL I have these detailed logs in the vicinity of the lag: here we have the last block before the lag:
...
Feb 09 08:44:15 DS7959 geth[652106]: DEBUG[02-09|08:44:15.362] Snapshot updated blockRoot=a0914c..e04fb0
Feb 09 08:44:15 DS7959 geth[652106]: DEBUG[02-09|08:44:15.362] Inserted new block number=15,100,522 hash=ab38c6..976a93 uncles=0 txs=154 gas=28,037,036 elapsed=59.387ms root=a0914c..e04fb
0
Feb 09 08:44:15 DS7959 geth[652106]: INFO [02-09|08:44:15.362] Imported new chain segment blocks=1 txs=154 mgas=28.037 elapsed=60.337ms mgasps=464.672 number=15,100,522 hash=ab38c6..976a93 dirt
y=151.91MiB
then, a 6+ second gap filled with messages like that:
...
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.165] Dropping transaction announcement id=fd3697620fd7be4e conn=inbound count=1
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.166] Transaction broadcast txs=1 announce packs=54 announced hashes=54 tx packs=7 broadcast txs=7
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.166] Fetching batch of transactions id=5728cd052f1995aa conn=inbound count=6
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.204] Dropping transaction announcement id=fd3697620fd7be4e conn=inbound count=1
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.204] Transaction broadcast txs=1 announce packs=55 announced hashes=55 tx packs=7 broadcast txs=7
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.209] Dropping transaction announcement id=fd3697620fd7be4e conn=inbound count=1
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.210] Transaction broadcast txs=1 announce packs=60 announced hashes=60 tx packs=8 broadcast txs=8
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.213] Dropping transaction announcement id=fd3697620fd7be4e conn=inbound count=1
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.213] Transaction broadcast txs=1 announce packs=55 announced hashes=55 tx packs=7 broadcast txs=7
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.242] Dropping transaction announcement id=fd3697620fd7be4e conn=inbound count=1
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.242] Transaction broadcast txs=1 announce packs=63 announced hashes=63 tx packs=8 broadcast txs=8
Feb 09 08:44:16 DS7959 geth[652106]: DEBUG[02-09|08:44:16.281] Dropping transaction announcement id=fd3697620fd7be4e conn=inbound count=6
...
then another block, after 6-7 sec after the previos one, accompanied by a weird piece of logs:
...
Feb 09 08:44:24 DS7959 geth[652106]: INFO [02-09|08:44:24.325] Imported new chain segment blocks=1 txs=180 mgas=35.676 elapsed=6.090s mgasps=5.858 number=15,100,523 hash=88da75..b40cda dirt
y=154.30MiB
Feb 09 08:44:24 DS7959 geth[652106]: DEBUG[02-09|08:44:24.325] Importing propagated block peer=70db7bd71bc6abada24e58d4e14f20af96156bfe6893dee00d8ee737b0ee17c4 number=15,100,524 hash=eed9e0..b27198
Feb 09 08:44:24 DS7959 geth[652106]: DEBUG[02-09|08:44:24.325] Ignoring already known block number=15,100,523 hash=88da75..b40cda
Feb 09 08:44:24 DS7959 geth[652106]: DEBUG[02-09|08:44:24.325] Synchronisation terminated elapsed=5.318s
Feb 09 08:44:24 DS7959 geth[652106]: DEBUG[02-09|08:44:24.325] Synchronising with the network peer=0b767b43fe0eb590aa5e32ba468f9025393263a3f7f48843be8e6f3a9d7fbddc eth=67 head=eed9e0..b27198 td=30,008,
708 mode=full
...
what is Synchronisation terminated elapsed=5.318s
?
happens every 10 minutes
here is another piece, where Downloader queue message
accompanies a 30sec lag:
│Feb 09 10:50:57 DS7959 geth[676510]: INFO [02-09|10:50:57.179] Imported new chain segment blocks=1 txs=149 mgas=26.322 elapsed=73.092ms mgasps=360.118 number=15,103,056 hash=f2ef13..1b58c3 dirty=591.94MiB ┤
│Feb 09 10:51:04 DS7959 geth[676510]: INFO [02-09|10:51:04.541] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=54.07KiB throttle=1213 │
│Feb 09 10:51:30 DS7959 geth[676510]: INFO [02-09|10:51:30.307] Imported new chain segment blocks=1 txs=236 mgas=38.189 elapsed=30.152s mgasps=1.267 number=15,103,057 hash=4b0a2e..64154e dirty=592.08MiB │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.153] Imported new chain segment blocks=1 txs=368 mgas=19.991 elapsed=6.845s mgasps=2.921 number=15,103,058 hash=652336..165ef1 dirty=594.31MiB ignored=1 │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.277] Imported new chain segment blocks=1 txs=180 mgas=30.009 elapsed=123.050ms mgasps=243.877 number=15,103,059 hash=2b84d3..0e9f21 dirty=383.99MiB │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.375] Imported new chain segment blocks=1 txs=181 mgas=26.155 elapsed=97.447ms mgasps=268.396 number=15,103,060 hash=b03b59..be5540 dirty=386.91MiB │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.518] Imported new chain segment blocks=1 txs=194 mgas=30.680 elapsed=143.279ms mgasps=214.129 number=15,103,061 hash=90feae..10ad12 dirty=389.52MiB │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.606] Imported new chain segment blocks=1 txs=238 mgas=32.137 elapsed=87.636ms mgasps=366.708 number=15,103,062 hash=916894..6c5da1 dirty=389.52MiB │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.680] Imported new chain segment blocks=1 txs=111 mgas=19.710 elapsed=71.140ms mgasps=277.052 number=15,103,063 hash=991ca7..9a557d dirty=392.29MiB │
│Feb 09 10:51:37 DS7959 geth[676510]: INFO [02-09|10:51:37.835] Imported new chain segment blocks=1 txs=225 mgas=42.187 elapsed=154.925ms mgasps=272.303 number=15,103,064 hash=143f1a..d7e55e dirty=395.50MiB │
│Feb 09 10:51:38 DS7959 geth[676510]: INFO [02-09|10:51:38.084] Imported new chain segment blocks=3 txs=703 mgas=74.883 elapsed=248.235ms mgasps=301.661 number=15,103,067 hash=ad32f5..a2250d dirty=399.49MiB ignored=1 ┤
│Feb 09 10:51:38 DS7959 geth[676510]: INFO [02-09|10:51:38.436] Imported new chain segment blocks=1 txs=159 mgas=28.018 elapsed=351.860ms mgasps=79.627 number=15,103,068 hash=627322..32d4ca dirty=400.88MiB ┤
│Feb 09 10:51:38 DS7959 geth[676510]: INFO [02-09|10:51:38.501] Imported new chain segment blocks=1 txs=370 mgas=23.345 elapsed=65.155ms mgasps=358.301 number=15,103,069 hash=b1292b..7f419b dirty=400.88MiB ignored=1 │
│Feb 09 10:51:39 DS7959 geth[676510]: INFO [02-09|10:51:39.257] Imported new chain segment blocks=1 txs=161 mgas=21.479 elapsed=72.903ms mgasps=294.623 number=15,103,070 hash=53b79e..134d65 dirty=402.17MiB │
│Feb 09 10:51:42 DS7959 geth[676510]: INFO [02-09|10:51:42.379] Imported new chain segment blocks=1 txs=179 mgas=28.278 elapsed=284.098ms mgasps=99.537 number=15,103,071 hash=2d0cce..73c0f9 dirty=403.82MiB
@KeefeL I have 2 identical (hardware-wise) nodes in the US (1.1.8) and in Germany (1.1.7). German node did not have any isuues up until about a week ago, but now they both experiance these lags
I have had this problem ever since first running a node back in March 2021. It seems to happen far less frequently if I run fewer nodes, but even then it can still happen. The main smoking gun here is that quite often, after a long lag period of getting no blocks, I'll get a "failed to unregister peer" error message, after which I get all the backlogged blocks at once. It almost seems that the client relies on a single peer for block updates (even if it's connected to several), and if that peer drops, then instead of just sourcing the blocks from the multitude of other peers, it just sits there in limbo for several minutes until that peer finally formally times out and it looks to others for block updates.
@KeefeL any comments on the above?
@KeefeL I am still having this issue. re-syncing from the latest snapshot had no effect
Having the same issue as @kawahwookiee with similar log output. 24 Core 256GB mem 25Gbps network NVMe
Is there any update on this @KeefeL ?