ssz icon indicating copy to clipboard operation
ssz copied to clipboard

Investigate inconsistent performance of hashTreeRoot()

Open twoeths opened this issue 4 years ago • 5 comments

part of https://github.com/ChainSafe/lodestar/issues/2046

this is a simple test to calculate hashTreeRoot

it.only("set validator valances", function () {
    this.timeout(0);
    const originalState = state.getOriginalState();
    // cache hashTreeRoot
    config.types.BeaconState.hashTreeRoot(originalState);
    const balances = Array.from({length: originalState.validators.length}, () => BigInt(31217089836));
    let minTime = Number.MAX_SAFE_INTEGER;
    let maxTime = 0;
    let average = {duration: 0, count: 0};
    const MAX_TRY = 10000;
    for (let i = 0; i < MAX_TRY; i++) {
      const state = config.types.BeaconState.clone(originalState);
      state.balances = balances as List<Gwei>;
      const start = Date.now();
      config.types.BeaconState.hashTreeRoot(state);
      const duration = Date.now() - start;
      const totalDuration = average.duration * average.count + duration;
      const totalCount = average.count + 1;
      average.count = totalCount;
      average.duration = totalDuration / totalCount;
      if (duration < minTime) minTime = duration;
      if (duration > maxTime) maxTime = duration;
    }
    console.log("hashTreeRoot minTime:", minTime, "maxTime:", maxTime, "average:", average.duration, "MAX_TRY:", MAX_TRY);
  });

this prints out hashTreeRoot minTime: 65 maxTime: 507 average: 80.46129999999977 MAX_TRY: 10000. I notice the max time is very similar to the one we have during a long epoch transition.

we need to investigate why the performance is inconsistent and if we can improve it.

twoeths avatar Feb 15 '21 13:02 twoeths

Performance is inconsistent because we only re-hash new nodes in the tree since the last hashTreeRoot. If there aren't many new nodes, it takes less time. If there are many new new nodes since a hashTreeRoot, then it takes longer.

In the case of your test, the balances, which is a large subtree, get replaced. So those subtrees need complete rehashing, and hashTreeRoot takes longer.

wemeetagain avatar Feb 15 '21 15:02 wemeetagain

Performance is inconsistent because we only re-hash new nodes in the tree since the last hashTreeRoot. If there aren't many new nodes, it takes less time. If there are many new new nodes since a hashTreeRoot, then it takes longer.

In the case of your test, the balances, which is a large subtree, get replaced. So those subtrees need complete rehashing, and hashTreeRoot takes longer.

in each for loop, I clone a new BeaconState to make it all equivalent so I think the cache is all the same (initially I did a hashTreeRoot call before the for loop to make the cache). Also, if I change MAX_TRY to 20, the maxTime is only 92 instead of 507 ( in case of MAX_TRY=10000): hashTreeRoot minTime: 67 maxTime: 92 average: 74.25 MAX_TRY: 20

so we need to investigate why sometimes it takes up to more than 500ms for MAX_TRY 10000

twoeths avatar Feb 16 '21 08:02 twoeths

this is due to our hash implementation in as-sha256 https://github.com/ChainSafe/as-sha256/pull/45, I don't see anything that can be improved in ssz or persistent-merkle-tree regarding this.

twoeths avatar Feb 25 '21 03:02 twoeths

I've done some extra analysis of the performance of hashTreeRoot. Commenting to link the issues https://github.com/ChainSafe/lodestar/issues/2206

The approach is to compare us with Lighthouse, and apparently our hashing function is a fast a theirs but for some reason our performance is still slower.

Also, to explain why we spent so much time hashing the state in the epoch transitions, maybe they pre-compute more roots beforehand somehow?

dapplion avatar Mar 21 '21 20:03 dapplion

we switched to ssz v2 a long time ago https://github.com/ChainSafe/ssz/pull/223 also the batch hash work going to save a lot of memory allocation which is the main contribution to the unstability

twoeths avatar Jul 09 '24 04:07 twoeths