Speed up account migration by making use of the result of previous run.
Problem Definition
The account based migration first group registers by account, and then migration different accounts in parallel. One optimization to consider is to make use of the result of previous run to skip some accounts if there is no updates in those account since the previous run.
For instance, let's say we would like to migration the state at block T, if we have a migrated state at block T - N, then we can scan through the checkpoint wal files between [T - N + 1, T], and build a ChangedAccountSet, since the wal file contains the trie updates for each block, which contains the account information. With the ChangeAccountSet, we can skip migration at block T on accounts that are not included in the set.
This optimization would speed up if:
- most of the accounts were not updated since the previous run
- the most time consuming accounts were not updated since the previous run
How to validate: We can validate this approach by comparing the result between skipping accounts and not skipping accounts. If the migrated state produce the same root hash, then it means skipping accounts could work.
@fxamacker @turbolent @janezpodhostnik @j1010001
I recently built a find-trie-root util that scan through each trie update record in the WAL files. We can do the similar scan and read the account info from the decoded trie update.
Thank you for this great idea @zhangchiqing!
Do we have to scan WAL files? If we migrate a state and know that an account has the same contents as in a previously migrated state, then we can reuse the previous migration result for that account -- it doesn't matter if it was potentially modified in between and "back".
We could maybe use the recently added code of the util diff-states command, which compares registers of accounts, or just create "content hashes" of the account contents and compare those to determine if an account was modified (we are not interested in all differences, just if there is a difference or not)
I think creating content hash would be slower than simply decoding a account info from the trie update. But I could be wrong, better to benchmark.
Good idea.
I would be concerned with the fact that small accounts have lower probability of being touched, so they will probably the same, but we are not saving a lot of time on those accounts. Accounts with more stuff in storage also have a higher probability of being touched, so there are less savings to be had where we need them.
Probably best to just try or to at least figure out how many accounts would be the same, and what is their size, and try to get a good estimate.
some shortcut here; for example account status register change ( especially storageID ) means account is changed then there is no need to hash anymore.
most of the accounts were not updated since the previous run
this is 100% true, modified account count is very minimal
the most time consuming accounts were not updated since the previous run
Probably this is not the case ( at least for mainnet )
PS: there is also case of new staged contract in between runs.
Good point Janez and bluesign. The bottleneck for migration duration has been the top few largest accounts.
If the largest account is modified, we might not see much difference compared to full state migration.
To simplify, if the largest account is modified, we can just do a full state migration like we are already doing. Otherwise, previously migrated and unmodified accounts can be copied while modified accounts are migrated.
To begin with, we can maybe add a flag to the diff-states command which reports modified accounts, and run this on a weekly basis.
The size distributions for TN and MN look quite different, few large VS many mostly medium sized accounts, so even if it won't make much of a difference for TN, it might still be worth it for MN.
Probably checking biggest account duration / total duration can give an insight too.
PS: last I checked biggest account duration was not so reliable though, but still gives some insight
We believe the benefit of this will be negligible and it adds risk to Crescendo MN migration - won't do.