snarkOS
snarkOS copied to clipboard
[Bug] snarkOS sometimes stops syncing
🐛 Bug Report
My node sometimes will stop syncing without any warning. Here's the log:
2023-12-12T21:34:51.848247Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:34:51.849142Z TRACE snarkos_node_sync::block_sync: Prepared 0 block requests
2023-12-12T21:34:54.850241Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:34:54.851316Z TRACE snarkos_node_sync::block_sync: Prepared 22 block requests
2023-12-12T21:34:58.103790Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:34:58.104740Z TRACE snarkos_node_sync::block_sync: Prepared 9 block requests
2023-12-12T21:35:01.211026Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:01.211727Z TRACE snarkos_node_sync::block_sync: Prepared 13 block requests
2023-12-12T21:35:04.357960Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:04.358547Z TRACE snarkos_node_sync::block_sync: Prepared 5 block requests
2023-12-12T21:35:07.416330Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:07.416927Z TRACE snarkos_node_sync::block_sync: Prepared 18 block requests
2023-12-12T21:35:10.621421Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:10.622391Z TRACE snarkos_node_sync::block_sync: Prepared 0 block requests
2023-12-12T21:35:13.128856Z DEBUG snarkos_node_router::heartbeat: Connected to 80 peers [...]
Unfortunately I've just removed some additional debug logs when investigating another potential issue so I'm not sure why the node stopped to process the block responses (assuming I do have received responses here).
Note that my node is connected to 80+ other nodes and if you look carefully the block sync interval is also adjusted, so officially it's out of specs right now (cli doesn't have options for specifying max peers).
Also sadly I can't use verbosity 5 and run the node until it happens again, it's way too noisy as I have too many connected nodes here, and reproducing this might take weeks.
BTW, when I was investigating "another potential issue" I found that the sync module sometimes would leave some stray block responses in the responses map in BlockSync struct in unknown circumstances. Not sure if it's related.
Steps to Reproduce
Not exactly sure.
Expected Behavior
The node should sync.
Your Environment
- snarkOS Version: 4896a1200a4605d1de6fe6cb53e1efa9ccdb6152, but I believe I've seen this behavior well before this
This looks like a bug I fixed here: https://github.com/AleoHQ/snarkOS/pull/2903. Can you try with that branch?
oops, didn't notice the incomplete title.
I'm not even sure if I can reproduce this issue before that pr is merged as it only happens randomly (and really sporadically), and because of #2917 I'm reluctant to do further modifications to my running node now as it directly affects my explorer.
That said, probably the stray responses map entries are caused by that as well, so maybe observing the content of that could help to see if it's related?
I understand. That said, the PR should be up-to-date with testnet3 and only adds 1 relatively simple commit.
related syncing problem #2910
Is this still relevant? Should be related to https://github.com/AleoHQ/snarkOS/issues/2978
I'm not even sure if this is #2978 or #3063. Probably we can close this for now and see if it happens again.