substrate icon indicating copy to clipboard operation
substrate copied to clipboard

Actually fix major sync detection

Open nazar-pc opened this issue 3 years ago • 22 comments

https://github.com/paritytech/substrate/pull/11547 didn't actually resolve https://github.com/paritytech/substrate/issues/11510.

median > self.best_queued_number caused best_seen to return None way before there is just 5 blocks left to import, causing SyncStatus::status flip to SyncState::Idle prematurely and changing is_major_sync to false as the result even though there might still be over 2k blocks left to import (queue for blocks to import has size 2048).

The solution is to check median for SyncStatus::status calculation, but leave best_seen_block as is (as I'm not 100% of implications of changing it).

Before:

2022-08-26 01:28:40 [PrimaryChain] Is major syncing: true    
2022-08-26 01:28:41 [PrimaryChain] Is major syncing: true    
2022-08-26 01:28:41 [PrimaryChain] ⚙️  Syncing 186.2 bps, target=#16494 (2 peers), best: #14266 (0x1f1f…43db), finalized #14166 (0x9dc2…9893), ⬇ 2.1MiB/s ⬆ 1.4kiB/s    
2022-08-26 01:28:42 [PrimaryChain] Is major syncing: true    
2022-08-26 01:28:43 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:44 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:45 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:46 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:46 [PrimaryChain] 💤 Idle (2 peers), best: #15065 (0x882a…bac5), finalized #14964 (0x4f5e…4989), ⬇ 1.3MiB/s ⬆ 2.2kiB/s    
2022-08-26 01:28:47 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:48 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:49 [PrimaryChain] Is major syncing: false    
2022-08-26 01:28:49 [PrimaryChain] ✨ Imported #15544 (0x1726…3a69)    
2022-08-26 01:28:49 [PrimaryChain] ✨ Imported #15545 (0xadd3…f7c2)    

After:

2022-08-26 02:23:00 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:01 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:01 [PrimaryChain] ⚙️  Preparing 62.0 bps (2 peers), best: #17668 (0xbf6a…f5a4), finalized #17567 (0xd79e…fe55), ⬇ 55.9kiB/s ⬆ 0.2kiB/s    
2022-08-26 02:23:02 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:03 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:04 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:05 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:06 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:06 [PrimaryChain] ⚙️  Syncing 63.2 bps, target=#18363 (2 peers), best: #17984 (0x8377…7c7e), finalized #17884 (0x8d2b…693c), ⬇ 2.2MiB/s ⬆ 1.4kiB/s    
2022-08-26 02:23:07 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:08 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:09 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:10 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:11 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:11 [PrimaryChain] 💤 Idle (0 peers), best: #18246 (0x7d61…fd5b), finalized #18145 (0x25c3…97b2), ⬇ 211.6kiB/s ⬆ 0.2kiB/s    
2022-08-26 02:23:12 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:13 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:14 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:15 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:16 [PrimaryChain] Is major syncing: true    
2022-08-26 02:23:16 [PrimaryChain] 💤 Idle (0 peers), best: #18362 (0x5f93…0ff3), finalized #18262 (0xb835…2688), ⬇ 0.2kiB/s ⬆ 0.2kiB/s    
2022-08-26 02:23:16 [PrimaryChain] ✨ Imported #18363 (0xc01f…8821)    
2022-08-26 02:23:16 [PrimaryChain] ✨ Imported #18364 (0xfe71…bbda)    
2022-08-26 02:23:16 [PrimaryChain] ✨ Imported #18365 (0xf875…4ca1)    
2022-08-26 02:23:17 [PrimaryChain] Is major syncing: false    
2022-08-26 02:23:17 [PrimaryChain] ✨ Imported #18366 (0xe97d…d925)    
2022-08-26 02:23:18 [PrimaryChain] Is major syncing: false    

Polkadot address: 1vSxzbyz2cJREAuVWjhXUT1ds8vBzoxn2w4asNpusQKwjJd

nazar-pc avatar Aug 25 '22 23:08 nazar-pc

cc @bkchr @andresilva and @arkpar as authors/reviewers of mentioned PR.

nazar-pc avatar Aug 25 '22 23:08 nazar-pc

Hm... turns out it was actually un-fixed by https://github.com/paritytech/substrate/pull/11817

nazar-pc avatar Aug 25 '22 23:08 nazar-pc

Maybe we should just introduce a third state Importing when we are finished with downloading, but still need to import the blocks?

bkchr avatar Aug 26 '22 05:08 bkchr

Sounds reasonable to me, I can do that too

nazar-pc avatar Aug 26 '22 13:08 nazar-pc

Sounds reasonable to me, I can do that too

Would be nice. Block production would not start until we are in IDLE state. However, internally we can distinguish between the different states.

bkchr avatar Aug 26 '22 21:08 bkchr

Added SyncState::Importing, but it now looks awkward with best_seen_block being separate. I feel like SyncState enum should contain that number inside of it, then it would be easier to handle things in informant. WDYT?

nazar-pc avatar Aug 29 '22 13:08 nazar-pc

Added SyncState::Importing, but it now looks awkward with best_seen_block being separate. I feel like SyncState enum should contain that number inside of it, then it would be easier to handle things in informant. WDYT?

Agree. Both Importing and Downloading should have it though.

arkpar avatar Sep 01 '22 13:09 arkpar

Pushed corresponding changes, but now I'm wondering if best_seen_block should be Some(target) in case of SyncState::Importing. It wasn't before, but maybe it should?

nazar-pc avatar Sep 06 '22 10:09 nazar-pc

Do you mean NetworkStatus::best_seen_block? I suggest it should be removed alltogeter, since we already have the target block in NetworkStatus::sync_state

arkpar avatar Sep 06 '22 11:09 arkpar

Did some more tweaks, let me know what you think

nazar-pc avatar Sep 06 '22 11:09 nazar-pc

bot merge

bkchr avatar Sep 16 '22 21:09 bkchr

Waiting for commit status.

Merge cancelled due to error. Error: Checks failed for 8fa9a7793b3108841d4facdfe3892a430ffe2d3b

bot merge

bkchr avatar Sep 16 '22 21:09 bkchr

Waiting for commit status.

Merge cancelled due to error. Error: Statuses failed for 8fa9a7793b3108841d4facdfe3892a430ffe2d3b

I see a sync-related test failing, I'll look into it

nazar-pc avatar Sep 17 '22 00:09 nazar-pc

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 17 '22 06:10 stale[bot]

I see a sync-related test failing, I'll look into it

Any update on this? @nazar-pc

bkchr avatar Oct 17 '22 07:10 bkchr

I'll get back to this soon, sorry it has been a while

nazar-pc avatar Oct 17 '22 07:10 nazar-pc

No problem!

bkchr avatar Oct 17 '22 20:10 bkchr

Last commit with optimization wasn't correct, so I just reverted it. Merge conflict was due to import, which is easy. Last commit is because otherwise sc-network doesn't compile (probably works in CI due to feature unification with some other crate).

nazar-pc avatar Oct 18 '22 08:10 nazar-pc

bot merge

bkchr avatar Oct 21 '22 13:10 bkchr

Error: Statuses failed for 1e4e550dfa3f7c3851a86a9133989d27f8167e30

bot rebase

bkchr avatar Oct 21 '22 13:10 bkchr

Rebased

bot merge

bkchr avatar Oct 21 '22 13:10 bkchr

Waiting for commit status.