aleth icon indicating copy to clipboard operation
aleth copied to clipboard

Aleth's rebuild functionality depends on extras database

Open halfalicious opened this issue 5 years ago • 4 comments

Aleth contains support for rebuilding its extras and state database - the rebuild functionality consists of Aleth deleting the existing state database and renaming the extras database (to extras.old), then reimporting all blocks from genesis through to the latest local block. It does this in BlockChain::rebuild: https://github.com/ethereum/aleth/blob/86aa9a3998ee700693fe16fcce37c527af3fb5f7/libethereum/BlockChain.cpp#L327-L408

Note that rebuild is called when the database minor version changes, implying that the schema of the extras database changed and the old data must be discarded. However, the rebuild function uses the block number -> block hash data in the old extras db to iterate over the blocks from genesis -> (local) chain tip.

In practice, extras database changes would probably not affect this data since it's very simple but I don't think it's good to assume this.

halfalicious avatar Nov 12 '19 04:11 halfalicious

My proposed solution (which only depends on the blocks database, where we can only look up blocks via hash):

  1. Write the hash of the most recent block to disk every N blocks, where N is some number not less than 1 minute's worth of sync'd blocks (4). This will be used to establish chain tip on rebuild
  2. On rebuild, read the latest block hash and use it to find the latest block in the blocks database.
  3. Delete the state and extras databases
  4. Starting with the latest block, rebuild the extras number -> hash data by inserting the hash and number (from each block header) and proceeding to the previous block (via Block::parentHash()).
  5. Once the extras number -> hash data has been repopulated, iterate over the blocks from genesis -> present and reimport each block into the blockchain. This will rebuild both the other extras data and the state database.

@gumb0 : Thoughts ?

halfalicious avatar Nov 12 '19 04:11 halfalicious

Well in general I don't think it's worth the trouble in case the current code works. I think it would be better to spend time documenting 1. what developer should and shouldn't do when they update extras layout 2. what a user should do when db version is updated (ideally nothing?). Also maybe improving log during rebuild process, if it's currently not great.

Additional thoughts:

  • top block hash of canonical chain is already saved to extras (under best key) every time it changes
  • this algorithm iterates only over canonical chain, so there will be some blocks left in blocks db that end up without indices (e.g. blockHash => blockDetails), but this might be not big of a problem, and the current code I think does the same.

gumb0 avatar Nov 12 '19 12:11 gumb0

@gumb0 thanks for the tip about the chain tip being stored in Extras, I had seen that. Regarding iterating only over the canonical chain - correct, the current algorithm does this as well. A more complete algo would involve also iterating over uncles, this would help us if there’s a reorg after a rebuild...is there any other reason to do this? We presumably need the uncles to identify the longest chain and compute block rewards?

Will examine how the current code works and document/update logs as necessary. If that ends up taking too much time I think I’ll just rewrite this since the logic should be pretty straightforward.

halfalicious avatar Nov 12 '19 16:11 halfalicious

We presumably need the uncles to identify the longest chain and compute block rewards?

Uncle headers are stored in the bodies of the blocks of the canonical chain anyway, so we won't loose them completely. Other non-canonical blocks and uncle bodies are not important I think, so it's not critical to lose them, just some wasted space in blocks DB.

gumb0 avatar Nov 12 '19 17:11 gumb0