bor icon indicating copy to clipboard operation
bor copied to clipboard

block sync stuck at 54,827,345

Open rekyyang opened this issue 1 year ago • 5 comments

Hi team, we found that our polygon-mainnet fullnode has stuck at height 54,827,345(and 54,827,346 is an empty block), and it seems that the chain also stuck at this height for several minutes. https://polygonscan.com/block/54827346

our fullnode have three versions, 1.2.7(1.2.8) and 1.1.0. the 1.1.0 node recovery after 6 minutes, but our 1.2.7&1.2.8 node recovery after 1 hour.

and this is the logs while stuck

WARN [03-19|05:14:28.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:14:40.079] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:14:40.079] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:14:52.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:14:52.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:04.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:15:04.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:06.762] Finished cleaning keys num=0 endHeight=54,740,945
INFO [03-19|05:15:16.076] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:15:16.076] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:28.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:15:28.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:40.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:15:40.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:52.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7 WARN [03-19|05:15:52.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:04.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949 WARN [03-19|05:16:04.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:04.076] Got new checkpoint from heimdall start=54,827,174 end=54,827,685 rootHash=0x53954494d63064bb8745aa1fb6506c069b820c9edb914c3adbe890f4328 WARN [03-19|05:16:04.076] Failed to whitelist checkpoint err="missing blocks"
WARN [03-19|05:16:04.076] unable to handle whitelist checkpoint err="missing blocks"
INFO [03-19|05:16:06.634] Finished cleaning keys num=0 endHeight=54,740,945
INFO [03-19|05:16:16.076] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949 WARN [03-19|05:16:16.076] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:28.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949 WARN [03-19|05:16:28.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:40.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949 WARN [03-19|05:16:40.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:52.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949

rekyyang avatar Mar 19 '24 05:03 rekyyang

our nodes are stuck, but at a different height #54,827,344

exodus-justinz avatar Mar 19 '24 06:03 exodus-justinz

our nodes are stuck, but at a different height #54,827,344

sry, I checked our node, it was stuck at #54,827,344 too, after reboot, it stuck at #54,827,345, and it took a long time to recovery And What are the possible reasons...

rekyyang avatar Mar 19 '24 07:03 rekyyang

Same for us node stuck at exactly the same height. I migrated the bor node from v1.2.3 to v1.2.8 this morning hoping to fix the issue. But it doesn’t change anything. Our second node is fine though (not migrated yet to the latest version). No error in the logs

WARN [03-19|14:52:18.585] unable to handle whitelist milestone     err="missing blocks"
INFO [03-19|14:52:18.736] Generating state snapshot                root=1a8dab..ed1bd0 in=074efe..684ab5 at=ecfb74..3f4283 accounts=8,762,814 slots=88,397,922 storage=7.12GiB dangling=0 elapsed=8m24.273s  eta=11h12m11.884s
INFO [03-19|14:52:26.740] Generating state snapshot                root=1a8dab..ed1bd0 in=076022..8a3c7d at=923819..f4d1d8 accounts=8,843,490 slots=88,925,105 storage=7.17GiB dangling=0 elapsed=8m32.276s  eta=11h8m17.573s
INFO [03-19|14:52:30.583] Got new milestone from heimdall          start=54,843,769 end=54,843,817 hash=0x2687a77083e5f44ac5f6367c154181b7f62f75af11771976e99cd9e19cab1ad1
WARN [03-19|14:52:30.583] unable to handle whitelist milestone     err="missing blocks"
INFO [03-19|14:52:34.740] Generating state snapshot                root=1a8dab..ed1bd0 at=076f96..cc162f accounts=8,915,678 slots=89,440,246 storage=7.21GiB dangling=0 elapsed=8m40.277s  eta=11h5m54.7s
INFO [03-19|14:52:42.583] Got new milestone from heimdall          start=54,843,769 end=54,843,817 hash=0x2687a77083e5f44ac5f6367c154181b7f62f75af11771976e99cd9e19cab1ad1
WARN [03-19|14:52:42.583] unable to handle whitelist milestone     err="missing blocks"
INFO [03-19|14:52:42.760] Generating state snapshot                root=1a8dab..ed1bd0 in=078703..7138b7 at=49fb7a..855d4b accounts=9,025,043 slots=89,855,760 storage=7.24GiB dangling=0 elapsed=8m48.296s  eta=10h57m20.586s
INFO [03-19|14:52:50.762] Generating state snapshot                root=1a8dab..ed1bd0 in=079479..e458da at=a483be..18aa94 accounts=9,087,987 slots=90,442,215 storage=7.28GiB dangling=0 elapsed=8m56.298s  eta=10h56m46.838s

bgiegel avatar Mar 19 '24 14:03 bgiegel

So indeed our node recovered after 30 minutes. It wasn’t at first because I had a probe that was killing my container if it stayed stuck for more than 15 minutes. Which was something I’ve setup because our node stay stuck on random block very often and restarting the process helps. But in that case we actually just needed to wait a bit.

bgiegel avatar Mar 19 '24 15:03 bgiegel

Does the release 1.2.9-beta fix this issue? I saw many reorg near the stuck height.

rekyyang avatar Mar 20 '24 02:03 rekyyang

Hi We observed a large reorg around this block. Exact issue is still being investigated and will require sometime.

Current mitigation:

  1. Restart node which should ideally fix the issue
  2. If restart doesnt help, rewind the node to few hundred blocks and wait for it to cross the stuck block number

This should help as it solved stuck node issue for our partners and validators

temaniarpit27 avatar Mar 21 '24 06:03 temaniarpit27

Thanks for reporting the issue, closing it for now.

VAIBHAVJINDAL3012 avatar Apr 04 '24 14:04 VAIBHAVJINDAL3012