snarkOS icon indicating copy to clipboard operation
snarkOS copied to clipboard

[Bug] snarkOS sometimes stops syncing

Open HarukaMa opened this issue 1 year ago • 6 comments

🐛 Bug Report

My node sometimes will stop syncing without any warning. Here's the log:

2023-12-12T21:34:51.848247Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795                                                                   
2023-12-12T21:34:51.849142Z TRACE snarkos_node_sync::block_sync: Prepared 0 block requests                     
2023-12-12T21:34:54.850241Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795                                                                   
2023-12-12T21:34:54.851316Z TRACE snarkos_node_sync::block_sync: Prepared 22 block requests                                                                                                                       
2023-12-12T21:34:58.103790Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795                                                                   
2023-12-12T21:34:58.104740Z TRACE snarkos_node_sync::block_sync: Prepared 9 block requests         
2023-12-12T21:35:01.211026Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:01.211727Z TRACE snarkos_node_sync::block_sync: Prepared 13 block requests
2023-12-12T21:35:04.357960Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:04.358547Z TRACE snarkos_node_sync::block_sync: Prepared 5 block requests
2023-12-12T21:35:07.416330Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:07.416927Z TRACE snarkos_node_sync::block_sync: Prepared 18 block requests
2023-12-12T21:35:10.621421Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 925872, canon_height = 925795
2023-12-12T21:35:10.622391Z TRACE snarkos_node_sync::block_sync: Prepared 0 block requests
2023-12-12T21:35:13.128856Z DEBUG snarkos_node_router::heartbeat: Connected to 80 peers [...]

Unfortunately I've just removed some additional debug logs when investigating another potential issue so I'm not sure why the node stopped to process the block responses (assuming I do have received responses here).

Note that my node is connected to 80+ other nodes and if you look carefully the block sync interval is also adjusted, so officially it's out of specs right now (cli doesn't have options for specifying max peers).

Also sadly I can't use verbosity 5 and run the node until it happens again, it's way too noisy as I have too many connected nodes here, and reproducing this might take weeks.

BTW, when I was investigating "another potential issue" I found that the sync module sometimes would leave some stray block responses in the responses map in BlockSync struct in unknown circumstances. Not sure if it's related.

Steps to Reproduce

Not exactly sure.

Expected Behavior

The node should sync.

Your Environment

  • snarkOS Version: 4896a1200a4605d1de6fe6cb53e1efa9ccdb6152, but I believe I've seen this behavior well before this

HarukaMa avatar Dec 12 '23 21:12 HarukaMa

This looks like a bug I fixed here: https://github.com/AleoHQ/snarkOS/pull/2903. Can you try with that branch?

joske avatar Dec 12 '23 21:12 joske

oops, didn't notice the incomplete title.

I'm not even sure if I can reproduce this issue before that pr is merged as it only happens randomly (and really sporadically), and because of #2917 I'm reluctant to do further modifications to my running node now as it directly affects my explorer.

That said, probably the stray responses map entries are caused by that as well, so maybe observing the content of that could help to see if it's related?

HarukaMa avatar Dec 12 '23 22:12 HarukaMa

I understand. That said, the PR should be up-to-date with testnet3 and only adds 1 relatively simple commit.

joske avatar Dec 12 '23 22:12 joske

related syncing problem #2910

2019jack avatar Dec 13 '23 02:12 2019jack

Is this still relevant? Should be related to https://github.com/AleoHQ/snarkOS/issues/2978

raychu86 avatar Feb 10 '24 02:02 raychu86

I'm not even sure if this is #2978 or #3063. Probably we can close this for now and see if it happens again.

HarukaMa avatar Feb 10 '24 03:02 HarukaMa