blockchain-core icon indicating copy to clipboard operation
blockchain-core copied to clipboard

Parallelize `build_hash_chain`

Open xandkar opened this issue 2 years ago • 0 comments

For a full chain resync, this reduces the build_hash_chain running time from ~65 minutes to ~11 minutes on my machine (16-core Ryzen 9 with 32 GB RAM).

Overview

High

The main idea is to unbundle the following bundled operations:

  1. retrieval of blocks (can be asynchronized)
  2. deserialization of blocks (can be parallelized)
  3. walk through block child->parent relations (the only essentially serial part of the job)

Mid

  1. asynchronously: 1.1. enumerate all {K, V} pairs in the blocks CF
  2. in-parallel: 2.1. consume above {K, V} pairs 2.2. deserialize blocks (the Vs above) 2.3. lookup parents 2.4. accumulate {Child, Parent} pairs
  3. in-serial: 3.1. build a child-to-parent relations map from above pairs 3.2. walk the relations from the youngest given hash and trace its longest possible lineage up to the oldest given hash

The last step, 3.2, is essentially what the previous, serial implementation used to do, but this time all the expensive deserialization has already been done, in parallel.

Low

See code :)

xandkar avatar May 10 '22 14:05 xandkar