blockchain-core
blockchain-core copied to clipboard
Parallelize `build_hash_chain`
For a full chain resync, this reduces the build_hash_chain
running time from ~65 minutes to ~11 minutes on my machine (16-core Ryzen 9 with 32 GB RAM).
Overview
High
The main idea is to unbundle the following bundled operations:
- retrieval of blocks (can be asynchronized)
- deserialization of blocks (can be parallelized)
- walk through block child->parent relations (the only essentially serial part of the job)
Mid
- asynchronously:
1.1. enumerate all
{K, V}
pairs in the blocks CF - in-parallel:
2.1. consume above
{K, V}
pairs 2.2. deserialize blocks (theV
s above) 2.3. lookup parents 2.4. accumulate{Child, Parent}
pairs - in-serial: 3.1. build a child-to-parent relations map from above pairs 3.2. walk the relations from the youngest given hash and trace its longest possible lineage up to the oldest given hash
The last step, 3.2, is essentially what the previous, serial implementation used to do, but this time all the expensive deserialization has already been done, in parallel.
Low
See code :)