block sync stuck at 54,827,345
Hi team, we found that our polygon-mainnet fullnode has stuck at height 54,827,345(and 54,827,346 is an empty block), and it seems that the chain also stuck at this height for several minutes. https://polygonscan.com/block/54827346
our fullnode have three versions, 1.2.7(1.2.8) and 1.1.0. the 1.1.0 node recovery after 6 minutes, but our 1.2.7&1.2.8 node recovery after 1 hour.
and this is the logs while stuck
WARN [03-19|05:14:28.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:14:40.079] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:14:40.079] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:14:52.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:14:52.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:04.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:15:04.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:06.762] Finished cleaning keys num=0 endHeight=54,740,945
INFO [03-19|05:15:16.076] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:15:16.076] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:28.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:15:28.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:40.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:15:40.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:15:52.075] Got new milestone from heimdall start=54,828,722 end=54,828,734 hash=0x59d23748f7d1302e249bc32ca9394c98409dce2f1e1ab5247eec7c073d86be7
WARN [03-19|05:15:52.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:04.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949
WARN [03-19|05:16:04.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:04.076] Got new checkpoint from heimdall start=54,827,174 end=54,827,685 rootHash=0x53954494d63064bb8745aa1fb6506c069b820c9edb914c3adbe890f4328
WARN [03-19|05:16:04.076] Failed to whitelist checkpoint err="missing blocks"
WARN [03-19|05:16:04.076] unable to handle whitelist checkpoint err="missing blocks"
INFO [03-19|05:16:06.634] Finished cleaning keys num=0 endHeight=54,740,945
INFO [03-19|05:16:16.076] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949
WARN [03-19|05:16:16.076] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:28.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949
WARN [03-19|05:16:28.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:40.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949
WARN [03-19|05:16:40.075] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|05:16:52.075] Got new milestone from heimdall start=54,828,735 end=54,829,125 hash=0x64cd447187cdabfd688333b01d6a0bfa17840f02aea80e740773417ab5de949
our nodes are stuck, but at a different height #54,827,344
our nodes are stuck, but at a different height
#54,827,344
sry, I checked our node, it was stuck at #54,827,344 too, after reboot, it stuck at #54,827,345, and it took a long time to recovery And What are the possible reasons...
Same for us node stuck at exactly the same height. I migrated the bor node from v1.2.3 to v1.2.8 this morning hoping to fix the issue. But it doesn’t change anything. Our second node is fine though (not migrated yet to the latest version). No error in the logs
WARN [03-19|14:52:18.585] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|14:52:18.736] Generating state snapshot root=1a8dab..ed1bd0 in=074efe..684ab5 at=ecfb74..3f4283 accounts=8,762,814 slots=88,397,922 storage=7.12GiB dangling=0 elapsed=8m24.273s eta=11h12m11.884s
INFO [03-19|14:52:26.740] Generating state snapshot root=1a8dab..ed1bd0 in=076022..8a3c7d at=923819..f4d1d8 accounts=8,843,490 slots=88,925,105 storage=7.17GiB dangling=0 elapsed=8m32.276s eta=11h8m17.573s
INFO [03-19|14:52:30.583] Got new milestone from heimdall start=54,843,769 end=54,843,817 hash=0x2687a77083e5f44ac5f6367c154181b7f62f75af11771976e99cd9e19cab1ad1
WARN [03-19|14:52:30.583] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|14:52:34.740] Generating state snapshot root=1a8dab..ed1bd0 at=076f96..cc162f accounts=8,915,678 slots=89,440,246 storage=7.21GiB dangling=0 elapsed=8m40.277s eta=11h5m54.7s
INFO [03-19|14:52:42.583] Got new milestone from heimdall start=54,843,769 end=54,843,817 hash=0x2687a77083e5f44ac5f6367c154181b7f62f75af11771976e99cd9e19cab1ad1
WARN [03-19|14:52:42.583] unable to handle whitelist milestone err="missing blocks"
INFO [03-19|14:52:42.760] Generating state snapshot root=1a8dab..ed1bd0 in=078703..7138b7 at=49fb7a..855d4b accounts=9,025,043 slots=89,855,760 storage=7.24GiB dangling=0 elapsed=8m48.296s eta=10h57m20.586s
INFO [03-19|14:52:50.762] Generating state snapshot root=1a8dab..ed1bd0 in=079479..e458da at=a483be..18aa94 accounts=9,087,987 slots=90,442,215 storage=7.28GiB dangling=0 elapsed=8m56.298s eta=10h56m46.838s
So indeed our node recovered after 30 minutes. It wasn’t at first because I had a probe that was killing my container if it stayed stuck for more than 15 minutes. Which was something I’ve setup because our node stay stuck on random block very often and restarting the process helps. But in that case we actually just needed to wait a bit.
Does the release 1.2.9-beta fix this issue? I saw many reorg near the stuck height.
Hi We observed a large reorg around this block. Exact issue is still being investigated and will require sometime.
Current mitigation:
- Restart node which should ideally fix the issue
- If restart doesnt help, rewind the node to few hundred blocks and wait for it to cross the stuck block number
This should help as it solved stuck node issue for our partners and validators
Thanks for reporting the issue, closing it for now.