nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

State sync error message is confusing

Open frol opened this issue 3 years ago • 7 comments

Describe the bug

Running a node from master I see these messages after the headers sync:

 Aug 12 16:22:20.886 INFO stats: #10885359 Downloading headers 98% -/4 30/28/40 peers ⬇ 566.3kiB/s ⬆ 25.2kiB/s 0.00 bps 0 gas/s CPU: 51%, Mem: 4.3 GiB
 Aug 12 16:22:31.397 INFO stats: #10885359 Downloading headers 99% -/4 30/28/40 peers ⬇ 574.6kiB/s ⬆ 19.9kiB/s 0.00 bps 0 gas/s CPU: 75%, Mem: 4.3 GiB
 Aug 12 16:22:41.865 INFO stats: #10885359 Downloading headers 99% -/4 30/28/40 peers ⬇ 592.1kiB/s ⬆ 22.6kiB/s 0.00 bps 0 gas/s CPU: 76%, Mem: 4.3 GiB
 Aug 12 16:22:52.118 INFO stats: #10885359 Downloading headers 99% -/4 30/28/40 peers ⬇ 583.8kiB/s ⬆ 25.9kiB/s 0.00 bps 0 gas/s CPU: 60%, Mem: 4.3 GiB
 Aug 12 16:23:04.938 INFO stats: State <span class="error">[0: header]</span> -/4 30/28/40 peers ⬇ 625.5kiB/s ⬆ 26.3kiB/s 0.00 bps 0 gas/s CPU: 76%, Mem: 4.3 GiB
 Aug 12 16:23:15.175 WARN sync: State sync didn't download the state for shard 0 in 10 seconds, sending StateRequest again
 Aug 12 16:23:15.276 INFO stats: State <span class="error">[0: parts]</span> -/4 30/28/40 peers ⬇ 3.7MiB/s ⬆ 25.3kiB/s 0.00 bps 0 gas/s CPU: 177%, Mem: 4.5 GiB
 Aug 12 16:23:25.439 WARN sync: State sync didn't download the state for shard 0 in 10 seconds, sending StateRequest again
 Aug 12 16:23:26.027 INFO stats: State <span class="error">[0: parts]</span> -/4 30/27/40 peers ⬇ 8.5MiB/s ⬆ 23.6kiB/s 0.00 bps 0 gas/s CPU: 169%, Mem: 4.5 GiB
 Aug 12 16:23:35.439 WARN sync: State sync didn't download the state for shard 0 in 10 seconds, sending StateRequest again
 Aug 12 16:23:36.030 INFO stats: State <span class="error">[0: parts]</span> -/4 30/27/40 peers ⬇ 9.4MiB/s ⬆ 21.5kiB/s 0.00 bps 0 gas/s CPU: 30%, Mem: 4.5 GiB
 Aug 12 16:23:58.577 INFO stats: State <span class="error">[0: done]</span> -/4 30/27/40 peers ⬇ 9.4MiB/s ⬆ 21.4kiB/s 0.00 bps 0 gas/s CPU: 100%, Mem: 4.8 GiB
 Aug 12 16:23:58.678 ERROR sync: State sync received hash 9YCa51LtyuYVL7rqP3KgaSAGqtywTpmLHm6eEP7LavGE that we're not expecting, potential malicious peer
 Aug 12 16:23:58.687 ERROR sync: State sync received hash 9YCa51LtyuYVL7rqP3KgaSAGqtywTpmLHm6eEP7LavGE that we're not expecting, potential malicious peer
 Aug 12 16:23:58.689 ERROR sync: State sync received hash 9YCa51LtyuYVL7rqP3KgaSAGqtywTpmLHm6eEP7LavGE that we're not expecting, potential malicious peer

It proceeded with block sync, so I assume it is not critical, yet it might be an indicator of a potential issue.

To Reproduce

I am not sure how reproducible this behavior, but there was no data folder and I just let it sync up.

Expected behavior

I think it might be an indicator of some bug even though it did not crash.

Screenshots

![image](https://user-images.githubusercontent.com/304265/90041146-42fcdc80-dc9f-11ea-818b-2171b91f5b70.png)

Version (please complete the following information):

  • nearcore commit/branch: 8ddefbffab0b674ff5517cee4ce34272f10b1d53
  • rust version (if local): nightly-2020-05-15
  • testnet

frol avatar Aug 12 '20 16:08 frol

I am not sure what is the issue here. Are you referring to something in the logs? I agree that we should probably suppress the warning as it is unlikely to finish downloading state in 10 seconds.

bowenwang1996 avatar Aug 13 '20 00:08 bowenwang1996

The node was running fine and I expect it not to produce ERROR message unless those need to be addressed. I still think those messages are valiable, so we may just transform those to INFO level. I am not sure about the "ERROR sync: State sync received hash 9YCa51LtyuYVL7rqP3KgaSAGqtywTpmLHm6eEP7LavGE that we're not expecting, potential malicious peer" error as it appears to me as some sort of race condition given that is just disapeared a minute later.

frol avatar Aug 13 '20 10:08 frol

That happens when a peer is behind the head of the network or the head is close to the epoch boundary. I can change the log level.

bowenwang1996 avatar Aug 13 '20 14:08 bowenwang1996

That happens when a peer is behind the head of the network or the head is close to the epoch boundary. I can change the log level.

I think those should be DEBUG level logs if you can actually test those conditions you listed, and if those conditions are not met that needs to be reported as WARNING.

Let's reserve ERRORs to the items that require some action.

I think it is a low priority.

frol avatar Aug 13 '20 19:08 frol

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 27 '21 22:09 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 27 '21 01:12 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 29 '22 00:03 stale[bot]

Hi, I am running a near localnet on 2 machines and getting the same error on the 2nd node. Is this known issue or it takes sometime to sync ?

2023-01-06T16:25:30.043449Z WARN sync: State sync didn't download the state for shard 0 in 60 seconds, sending StateRequest again

singhparshant avatar Jan 06 '23 16:01 singhparshant

CC: @nikurt: could you take a look please?

akhi3030 avatar Jan 09 '23 15:01 akhi3030