hsd
hsd copied to clipboard
walletdb: ChainState.startHeight and `scan()`
startHeight is a strange property of the chainstate in walletDB and we should check how it is used.
It can affect users when they rescan, because it becomes the default scan-from height:
https://github.com/handshake-org/hsd/blob/98a6491cdbfb173b4834a892b9bd55b6839cadbf/lib/wallet/walletdb.js#L512-L522
The misbehavior of this can be reproduced:
- start hsd with a fresh data directory in regtest
- generate 100 blocks
- stop hsd, delete regtest/wallet directory & restart hsd (this will create a new walletDB "at start height 100")
hsw-cli rescandoes nothinghsw-cli rescan 0actually rescans (from genesis block)
I think this behavior may have at one point made sense (why bother rescanning from blocks before the wallet was created?) but anyone who's ever imported a wallet from seed phrase knows you need to rescan the entire chain, not just since the database was initialized.
Another part of this mystery is the purpose of markState():
https://github.com/handshake-org/hsd/blob/98a6491cdbfb173b4834a892b9bd55b6839cadbf/lib/wallet/walletdb.js#L1870-L1880
chain startHeight, startHash both mark the point at which the first known transaction to the walletdb occured. This is not specific to a wallet, but instead earliest for all wallets. This allows easy to rescan height when something some transaction were missed somewhere in history.
marked just tracks whether we have ever received a tx, that is checked if we want to move forward the startHeight.
On reorganization if we move past the startHeight in the history, it wont set to 0/ZERO_HASH, instead it will get unmarked and height/hash will follow. This is okay as well with the design goal for the startHeight/startHash, it will mark whenever new transaction is found, will get marked and startHash/startHeight will get frozen at that point.
Better name for this probably would be, firstTransactionBlockHeight and firstTransactionBlockHash. After pagination port it should also possible to use height based transaction indexes to recover this data by checking minimum heights of all wallets, but becomes costlier for walletDBs with a lot of wallets.
This is not helpful for new wallets, because it's unknown where did that wallet receive tx. Is it fresh or imported ?
If @rithvik has some data what's the most used we could drop rescan default to be 0 instead of startHeight/startHash. Instead this will only serve as the information. New rescan will recover to the proper height. Adding per wallet information for this using pagination indexes is also good. If you are not using backup and want to export single wallet, this information is not useful.
I think we should have this but expose somewhere, I am thinking of adding new endpoint GET / that will be similar to Node GET /. This will return general information about walletDB (maybe have it behind admin flags).
Only issue related to the startHeight, startHash, mark is the corruption test case mentioned in the test coverage that is skipped there.
If we receive a tx at block X and it's a first one but wallet crashes without insert, it will still store the startHeight/startHash at that height. If after recovery the block at that height no longer has the transaction the state will stay marked and startHeight/hash wont be correct and will no longer move forward.