substrate icon indicating copy to clipboard operation
substrate copied to clipboard

State-db refactoring

Open arkpar opened this issue 3 years ago • 4 comments

Remove "pending" state in statedb, which greatly simplifies implementation. Now in case there's a backend error the in-memory state is reverted by simply reloading from disk.

Also fixes an issue with #11980. After warp sync, if node is restarted before any of the block is pruned, it would not be able to start again in consustent state.

arkpar avatar Sep 12 '22 09:09 arkpar

try_commit failures result in node termination anyway. So the reset is just simply there to allow data consistency guarantees at the statedb API level. But the way it is used in substrate currently, it does not really matter.

arkpar avatar Sep 13 '22 10:09 arkpar

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 13 '22 23:10 stale[bot]

Waiting for second review, please don't close.

cheme avatar Oct 14 '22 08:10 cheme

@bkchr Could you please take a look?

arkpar avatar Oct 14 '22 11:10 arkpar

Recently I got few reports from Khala Node operators that when syncing a new node from 0, they randomly experiencing

[Block import error: Backend error: Can't canonicalize missing block number #{BLOCK_NUMBER} when importing {BLOCK_HASH}]

the log looks like

2022-11-01 07:08:54 [Parachain] Block import error: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:08:54 [Parachain] 💔 Error importing block 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead: consensus error: Import failed: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:08:55 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629792 (11 peers), best: #2482944 (0x7272…aac6), finalized #405357 (0x2ba4…5c69), ⬇ 1.8MiB/s ⬆ 2.7kiB/s
2022-11-01 07:08:56 [Relaychain] ⚙️  Syncing 48.7 bps, target=#15134037 (30 peers), best: #9247717 (0x6478…a2cc), finalized #9247232 (0xd3a2…417d), ⬇ 1.2MiB/s ⬆ 132.9kiB/s
2022-11-01 07:09:00 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629792 (11 peers), best: #2482944 (0x7272…aac6), finalized #405357 (0x2ba4…5c69), ⬇ 8.8MiB/s ⬆ 3.3kiB/s
2022-11-01 07:09:01 [Relaychain] ⚙️  Syncing 43.5 bps, target=#15134037 (30 peers), best: #9247935 (0x9a21…f6fe), finalized #9247744 (0xc1c1…b50c), ⬇ 1.0MiB/s ⬆ 139.8kiB/s
2022-11-01 07:09:05 [Parachain] Block import error: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:09:05 [Parachain] 💔 Error importing block 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead: consensus error: Import failed: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:09:05 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629792 (11 peers), best: #2482944 (0x7272…aac6), finalized #405593 (0x89a5…6ff9), ⬇ 9.0MiB/s ⬆ 4.2kiB/s
2022-11-01 07:09:06 [Relaychain] ⚙️  Syncing 48.3 bps, target=#15134037 (30 peers), best: #9248177 (0xb440…3ff2), finalized #9247744 (0xc1c1…b50c), ⬇ 1.0MiB/s ⬆ 121.6kiB/s
2022-11-01 07:09:10 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629792 (13 peers), best: #2482944 (0x7272…aac6), finalized #405593 (0x89a5…6ff9), ⬇ 9.7MiB/s ⬆ 4.6kiB/s
2022-11-01 07:09:11 [Relaychain] ⚙️  Syncing 42.5 bps, target=#15134037 (30 peers), best: #9248390 (0xdbc3…aad9), finalized #9248256 (0xb48f…6a96), ⬇ 953.1kiB/s ⬆ 120.9kiB/s
2022-11-01 07:09:15 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629793 (11 peers), best: #2482944 (0x7272…aac6), finalized #405593 (0x89a5…6ff9), ⬇ 9.0MiB/s ⬆ 4.2kiB/s
2022-11-01 07:09:15 [Parachain] Block import error: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:09:15 [Parachain] 💔 Error importing block 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead: consensus error: Import failed: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:09:16 [Relaychain] ⚙️  Syncing 47.7 bps, target=#15134045 (30 peers), best: #9248629 (0x0da6…9e37), finalized #9248256 (0xb48f…6a96), ⬇ 1.0MiB/s ⬆ 130.5kiB/s
2022-11-01 07:09:20 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629793 (13 peers), best: #2482944 (0x7272…aac6), finalized #405829 (0x92bb…2cd7), ⬇ 5.7MiB/s ⬆ 7.7kiB/s
2022-11-01 07:09:21 [Relaychain] ⚙️  Syncing 43.3 bps, target=#15134046 (30 peers), best: #9248846 (0x4dff…ab5b), finalized #9248769 (0xeb85…0f4a), ⬇ 956.7kiB/s ⬆ 123.9kiB/s
2022-11-01 07:09:25 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629795 (13 peers), best: #2482944 (0x7272…aac6), finalized #405829 (0x92bb…2cd7), ⬇ 6.9MiB/s ⬆ 5.7kiB/s
2022-11-01 07:09:26 [Relaychain] ⚙️  Syncing 43.1 bps, target=#15134046 (30 peers), best: #9249062 (0xf167…12a9), finalized #9248769 (0xeb85…0f4a), ⬇ 889.9kiB/s ⬆ 140.6kiB/s
2022-11-01 07:09:30 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629798 (14 peers), best: #2482944 (0x7272…aac6), finalized #406070 (0x49b1…c151), ⬇ 9.4MiB/s ⬆ 5.5kiB/s
2022-11-01 07:09:31 [Relaychain] ⚙️  Syncing 45.6 bps, target=#15134046 (30 peers), best: #9249290 (0x88d5…5cf9), finalized #9249281 (0x2c85…36a1), ⬇ 1.1MiB/s ⬆ 146.0kiB/s
2022-11-01 07:09:35 [Parachain] ⚙️  Syncing  0.0 bps, target=#2629798 (15 peers), best: #2482944 (0x7272…aac6), finalized #406070 (0x49b1…c151), ⬇ 5.8MiB/s ⬆ 3.6kiB/s
2022-11-01 07:09:36 [Relaychain] ⚙️  Syncing 45.9 bps, target=#15134046 (30 peers), best: #9249520 (0x9522…56f7), finalized #9249281 (0x2c85…36a1), ⬇ 1000.6kiB/s ⬆ 112.3kiB/s
2022-11-01 07:09:38 [Parachain] Block import error: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:09:38 [Parachain] 💔 Error importing block 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead: consensus error: Import failed: Backend error: Can't canonicalize missing block number #2482945 when importing 0xa0000aacfc918561284868cba29427613320594d3b129dc5981b0946f8368ead (#2487041)
2022-11-01 07:09:38 [Parachain] 💔 Error importing block 0x27939286e6722c962d6c964e4bfc4a3371b2ac6d78f6b2ee534a6a1ad7543786: block has an unknown parent

the node stuck there and seems won't advance anymore, restart node can't help, only delete DB and do resync, but it may occur again

I saw the error message comming from force_delayed_canonicalize which called by try_commit_operation

do you think this refactor can help?

jasl avatar Nov 01 '22 23:11 jasl

@jasl does not seem to be related. Please file a separate issue. Collecting logs with -l db=trace would help there.

arkpar avatar Nov 07 '22 22:11 arkpar

@jasl does not seem to be related. Please file a separate issue. Collecting logs with -l db=trace would help there.

https://github.com/paritytech/substrate/issues/12613 also I attached log

jasl avatar Nov 08 '22 06:11 jasl

And sorry for the delay. :see_no_evil:

bkchr avatar Nov 08 '22 10:11 bkchr

One question in general, could we not simplify DeathRowQueue even more when we would make the Mem backend only also keep the inserted keys in memory. Otherwise there is no further difference or? We could also use batch loading etc? Or do I miss something?

The main difference for Mem vs DbBacked is that Mem does its own tracking of key references (death_index). Mem is currently only used when the backend database does not support reference counting (i.e. rocksdb). Once we remove rocksdb, we can remove Mem as well.

arkpar avatar Nov 08 '22 10:11 arkpar

bot merge

arkpar avatar Nov 08 '22 10:11 arkpar