lodestar
lodestar copied to clipboard
Optimize backfill sync to efficiently use reqresp fetched block data
Motivation As @tuyennhv pointed out after doing profiler runs, backfill sync could be optimized to use treebacked data from the sync range/block by root.
This PR changes the following:
- Use treebacked block's hashtree root of the synced block to get its blockroot to verify parent/child relationship in verify block sequence
- Use blockArchive's batch put binary using treebacked block's data
- add a hidden cli option
--sync.backfillBatchSizeto speciify the batch size for backfill sync Description
Closes #3657
Code Climate has analyzed commit 7928f92e and detected 0 issues on this pull request.
View more on Code Climate.
Performance Report
✔️ no performance regression detected
Full benchmark results
| Benchmark suite | Current: 3ecd574d4b3f34c6e6adf143eaf450edc3d1fbc1 | Previous: 52032f803226e5e6aa490e3f49d6b12c8e7ca714 | Ratio |
|---|---|---|---|
| BeaconState.hashTreeRoot - No change | 632.00 ns/op | 459.00 ns/op | 1.38 |
| BeaconState.hashTreeRoot - 1 full validator | 146.65 us/op | 119.54 us/op | 1.23 |
| BeaconState.hashTreeRoot - 32 full validator | 2.2193 ms/op | 1.6142 ms/op | 1.37 |
| BeaconState.hashTreeRoot - 512 full validator | 29.189 ms/op | 21.199 ms/op | 1.38 |
| BeaconState.hashTreeRoot - 1 validator.effectiveBalance | 148.58 us/op | 114.92 us/op | 1.29 |
| BeaconState.hashTreeRoot - 32 validator.effectiveBalance | 2.4174 ms/op | 1.8303 ms/op | 1.32 |
| BeaconState.hashTreeRoot - 512 validator.effectiveBalance | 31.894 ms/op | 24.288 ms/op | 1.31 |
| BeaconState.hashTreeRoot - 1 balances | 108.14 us/op | 90.823 us/op | 1.19 |
| BeaconState.hashTreeRoot - 32 balances | 874.42 us/op | 699.99 us/op | 1.25 |
| BeaconState.hashTreeRoot - 512 balances | 8.3269 ms/op | 6.4099 ms/op | 1.30 |
| BeaconState.hashTreeRoot - 250000 balances | 154.29 ms/op | 130.19 ms/op | 1.19 |
| processSlot - 1 slots | 55.128 us/op | 42.485 us/op | 1.30 |
| processSlot - 32 slots | 3.3190 ms/op | 2.7116 ms/op | 1.22 |
| getCommitteeAssignments - req 1 vs - 250000 vc | 6.2915 ms/op | 4.5003 ms/op | 1.40 |
| getCommitteeAssignments - req 100 vs - 250000 vc | 8.7312 ms/op | 6.3586 ms/op | 1.37 |
| getCommitteeAssignments - req 1000 vs - 250000 vc | 9.4162 ms/op | 6.6776 ms/op | 1.41 |
| computeProposers - vc 250000 | 24.151 ms/op | 17.569 ms/op | 1.37 |
| computeEpochShuffling - vc 250000 | 216.77 ms/op | 155.20 ms/op | 1.40 |
| getNextSyncCommittee - vc 250000 | 398.75 ms/op | 287.40 ms/op | 1.39 |
| altair processAttestation - 250000 vs - 7PWei normalcase | 62.361 ms/op | 41.547 ms/op | 1.50 |
| altair processAttestation - 250000 vs - 7PWei worstcase | 51.513 ms/op | 41.234 ms/op | 1.25 |
| altair processAttestation - setStatus - 1/6 committees join | 11.157 ms/op | 8.2089 ms/op | 1.36 |
| altair processAttestation - setStatus - 1/3 committees join | 23.840 ms/op | 18.702 ms/op | 1.27 |
| altair processAttestation - setStatus - 1/2 committees join | 36.733 ms/op | 28.477 ms/op | 1.29 |
| altair processAttestation - setStatus - 2/3 committees join | 48.837 ms/op | 38.715 ms/op | 1.26 |
| altair processAttestation - setStatus - 4/5 committees join | 57.705 ms/op | 45.828 ms/op | 1.26 |
| altair processAttestation - setStatus - 100% committees join | 73.015 ms/op | 61.495 ms/op | 1.19 |
| altair processAttestation - updateEpochParticipants - 1/6 committees join | 12.001 ms/op | 9.6705 ms/op | 1.24 |
| altair processAttestation - updateEpochParticipants - 1/3 committees join | 25.396 ms/op | 20.499 ms/op | 1.24 |
| altair processAttestation - updateEpochParticipants - 1/2 committees join | 33.657 ms/op | 69.434 ms/op | 0.48 |
| altair processAttestation - updateEpochParticipants - 2/3 committees join | 39.757 ms/op | 24.073 ms/op | 1.65 |
| altair processAttestation - updateEpochParticipants - 4/5 committees join | 34.734 ms/op | 24.759 ms/op | 1.40 |
| altair processAttestation - updateEpochParticipants - 100% committees join | 36.780 ms/op | 28.344 ms/op | 1.30 |
| altair processAttestation - updateAllStatus | 26.778 ms/op | 19.884 ms/op | 1.35 |
| altair processBlock - 250000 vs - 7PWei normalcase | 52.484 ms/op | 45.081 ms/op | 1.16 |
| altair processBlock - 250000 vs - 7PWei worstcase | 135.07 ms/op | 113.09 ms/op | 1.19 |
| altair processEpoch - mainnet_e81889 | 1.3341 s/op | 1.0446 s/op | 1.28 |
| mainnet_e81889 - altair beforeProcessEpoch | 311.04 ms/op | 270.21 ms/op | 1.15 |
| mainnet_e81889 - altair processJustificationAndFinalization | 100.50 us/op | 114.38 us/op | 0.88 |
| mainnet_e81889 - altair processInactivityUpdates | 20.626 ms/op | 15.352 ms/op | 1.34 |
| mainnet_e81889 - altair processRewardsAndPenalties | 280.16 ms/op | 245.52 ms/op | 1.14 |
| mainnet_e81889 - altair processRegistryUpdates | 15.891 us/op | 11.040 us/op | 1.44 |
| mainnet_e81889 - altair processSlashings | 4.5470 us/op | 2.5330 us/op | 1.80 |
| mainnet_e81889 - altair processEth1DataReset | 3.7860 us/op | 2.1130 us/op | 1.79 |
| mainnet_e81889 - altair processEffectiveBalanceUpdates | 13.703 ms/op | 12.072 ms/op | 1.14 |
| mainnet_e81889 - altair processSlashingsReset | 19.564 us/op | 17.163 us/op | 1.14 |
| mainnet_e81889 - altair processRandaoMixesReset | 21.061 us/op | 24.986 us/op | 0.84 |
| mainnet_e81889 - altair processHistoricalRootsUpdate | 5.3090 us/op | 2.4540 us/op | 2.16 |
| mainnet_e81889 - altair processParticipationFlagUpdates | 115.28 ms/op | 129.29 ms/op | 0.89 |
| mainnet_e81889 - altair processSyncCommitteeUpdates | 3.4330 us/op | 1.7710 us/op | 1.94 |
| mainnet_e81889 - altair afterProcessEpoch | 258.03 ms/op | 218.70 ms/op | 1.18 |
| altair processInactivityUpdates - 250000 normalcase | 75.366 ms/op | 59.047 ms/op | 1.28 |
| altair processInactivityUpdates - 250000 worstcase | 76.656 ms/op | 61.189 ms/op | 1.25 |
| altair processParticipationFlagUpdates - 250000 anycase | 105.92 ms/op | 87.069 ms/op | 1.22 |
| altair processRewardsAndPenalties - 250000 normalcase | 282.07 ms/op | 213.94 ms/op | 1.32 |
| altair processRewardsAndPenalties - 250000 worstcase | 246.84 ms/op | 242.88 ms/op | 1.02 |
| altair processSyncCommitteeUpdates - 250000 | 418.10 ms/op | 303.21 ms/op | 1.38 |
| Tree 40 250000 create | 916.60 ms/op | 557.26 ms/op | 1.64 |
| Tree 40 250000 get(125000) | 393.40 ns/op | 272.30 ns/op | 1.44 |
| Tree 40 250000 set(125000) | 2.4816 us/op | 1.7373 us/op | 1.43 |
| Tree 40 250000 toArray() | 46.567 ms/op | 32.249 ms/op | 1.44 |
| Tree 40 250000 iterate all - toArray() + loop | 53.635 ms/op | 33.170 ms/op | 1.62 |
| Tree 40 250000 iterate all - get(i) | 145.72 ms/op | 99.432 ms/op | 1.47 |
| MutableVector 250000 create | 24.008 ms/op | 22.480 ms/op | 1.07 |
| MutableVector 250000 get(125000) | 17.081 ns/op | 11.539 ns/op | 1.48 |
| MutableVector 250000 set(125000) | 538.90 ns/op | 460.66 ns/op | 1.17 |
| MutableVector 250000 toArray() | 10.052 ms/op | 7.8415 ms/op | 1.28 |
| MutableVector 250000 iterate all - toArray() + loop | 23.223 ms/op | 7.0594 ms/op | 3.29 |
| MutableVector 250000 iterate all - get(i) | 4.1264 ms/op | 2.9306 ms/op | 1.41 |
| Array 250000 create | 6.2166 ms/op | 4.7171 ms/op | 1.32 |
| Array 250000 clone - spread | 2.0731 ms/op | 2.1283 ms/op | 0.97 |
| Array 250000 get(125000) | 1.0460 ns/op | 1.0280 ns/op | 1.02 |
| Array 250000 set(125000) | 1.0400 ns/op | 1.0180 ns/op | 1.02 |
| Array 250000 iterate all - loop | 200.67 us/op | 168.79 us/op | 1.19 |
| aggregationBits - 2048 els - readonlyValues | 244.92 us/op | 233.26 us/op | 1.05 |
| aggregationBits - 2048 els - zipIndexesInBitList | 41.836 us/op | 39.256 us/op | 1.07 |
| regular array get 100000 times | 80.775 us/op | 67.401 us/op | 1.20 |
| wrappedArray get 100000 times | 80.687 us/op | 67.405 us/op | 1.20 |
| arrayWithProxy get 100000 times | 32.526 ms/op | 28.384 ms/op | 1.15 |
| ssz.Root.equals | 1.3990 us/op | 1.0450 us/op | 1.34 |
| ssz.Root.equals with valueOf() | 1.4960 us/op | 1.2210 us/op | 1.23 |
| byteArrayEquals with valueOf() | 1.4820 us/op | 1.2450 us/op | 1.19 |
| phase0 processBlock - 250000 vs - 7PWei normalcase | 12.137 ms/op | 10.161 ms/op | 1.19 |
| phase0 processBlock - 250000 vs - 7PWei worstcase | 88.955 ms/op | 73.096 ms/op | 1.22 |
| phase0 afterProcessEpoch - 250000 vs - 7PWei | 243.79 ms/op | 202.06 ms/op | 1.21 |
| phase0 beforeProcessEpoch - 250000 vs - 7PWei | 625.07 ms/op | 552.49 ms/op | 1.13 |
| phase0 processEpoch - mainnet_e58758 | 974.57 ms/op | 796.05 ms/op | 1.22 |
| mainnet_e58758 - phase0 beforeProcessEpoch | 487.61 ms/op | 400.57 ms/op | 1.22 |
| mainnet_e58758 - phase0 processJustificationAndFinalization | 76.449 us/op | 109.10 us/op | 0.70 |
| mainnet_e58758 - phase0 processRewardsAndPenalties | 162.06 ms/op | 139.01 ms/op | 1.17 |
| mainnet_e58758 - phase0 processRegistryUpdates | 42.166 us/op | 71.956 us/op | 0.59 |
| mainnet_e58758 - phase0 processSlashings | 3.1350 us/op | 2.6440 us/op | 1.19 |
| mainnet_e58758 - phase0 processEth1DataReset | 2.9860 us/op | 1.7650 us/op | 1.69 |
| mainnet_e58758 - phase0 processEffectiveBalanceUpdates | 12.142 ms/op | 9.9046 ms/op | 1.23 |
| mainnet_e58758 - phase0 processSlashingsReset | 12.134 us/op | 15.071 us/op | 0.81 |
| mainnet_e58758 - phase0 processRandaoMixesReset | 15.691 us/op | 25.337 us/op | 0.62 |
| mainnet_e58758 - phase0 processHistoricalRootsUpdate | 3.9010 us/op | 2.8800 us/op | 1.35 |
| mainnet_e58758 - phase0 processParticipationRecordUpdates | 12.983 us/op | 17.851 us/op | 0.73 |
| mainnet_e58758 - phase0 afterProcessEpoch | 214.99 ms/op | 176.54 ms/op | 1.22 |
| phase0 processEffectiveBalanceUpdates - 250000 normalcase | 12.763 ms/op | 10.929 ms/op | 1.17 |
| phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 | 1.5112 s/op | 1.2208 s/op | 1.24 |
| phase0 processRegistryUpdates - 250000 normalcase | 58.184 us/op | 73.436 us/op | 0.79 |
| phase0 processRegistryUpdates - 250000 badcase_full_deposits | 3.2927 ms/op | 2.9192 ms/op | 1.13 |
| phase0 processRegistryUpdates - 250000 worstcase 0.5 | 1.9872 s/op | 1.6337 s/op | 1.22 |
| phase0 getAttestationDeltas - 250000 normalcase | 93.124 ms/op | 85.672 ms/op | 1.09 |
| phase0 getAttestationDeltas - 250000 worstcase | 94.772 ms/op | 86.084 ms/op | 1.10 |
| phase0 processSlashings - 250000 worstcase | 45.429 ms/op | 33.164 ms/op | 1.37 |
| shuffle list - 16384 els | 15.163 ms/op | 12.414 ms/op | 1.22 |
| shuffle list - 250000 els | 216.75 ms/op | 178.82 ms/op | 1.21 |
| getEffectiveBalances - 250000 vs - 7PWei | 12.021 ms/op | 9.8422 ms/op | 1.22 |
| pass gossip attestations to forkchoice per slot | 17.035 ms/op | 18.253 ms/op | 0.93 |
| computeDeltas | 4.5872 ms/op | 3.2430 ms/op | 1.41 |
| computeProposerBoostScoreFromBalances | 403.29 us/op | 337.43 us/op | 1.20 |
| getPubkeys - index2pubkey - req 1000 vs - 250000 vc | 2.3421 ms/op | 1.8950 ms/op | 1.24 |
| getPubkeys - validatorsArr - req 1000 vs - 250000 vc | 823.91 us/op | 689.93 us/op | 1.19 |
| BLS verify - blst-native | 2.2173 ms/op | 1.8596 ms/op | 1.19 |
| BLS verifyMultipleSignatures 3 - blst-native | 4.5675 ms/op | 3.8180 ms/op | 1.20 |
| BLS verifyMultipleSignatures 8 - blst-native | 9.8641 ms/op | 8.2345 ms/op | 1.20 |
| BLS verifyMultipleSignatures 32 - blst-native | 35.666 ms/op | 29.884 ms/op | 1.19 |
| BLS aggregatePubkeys 32 - blst-native | 46.988 us/op | 39.940 us/op | 1.18 |
| BLS aggregatePubkeys 128 - blst-native | 182.89 us/op | 153.93 us/op | 1.19 |
| getAttestationsForBlock | 105.70 ms/op | 77.669 ms/op | 1.36 |
| CheckpointStateCache - add get delete | 21.665 us/op | 17.047 us/op | 1.27 |
| validate gossip signedAggregateAndProof - struct | 5.3564 ms/op | 4.4499 ms/op | 1.20 |
| validate gossip signedAggregateAndProof - treeBacked | 5.2764 ms/op | 4.3957 ms/op | 1.20 |
| validate gossip attestation - struct | 2.4972 ms/op | 2.0965 ms/op | 1.19 |
| validate gossip attestation - treeBacked | 2.5167 ms/op | 2.1146 ms/op | 1.19 |
| bytes32 toHexString | 1.9250 us/op | 1.5300 us/op | 1.26 |
| bytes32 Buffer.toString(hex) | 849.00 ns/op | 683.00 ns/op | 1.24 |
| bytes32 Buffer.toString(hex) from Uint8Array | 1.1280 us/op | 887.00 ns/op | 1.27 |
| bytes32 Buffer.toString(hex) + 0x | 849.00 ns/op | 677.00 ns/op | 1.25 |
| Object access 1 prop | 0.37900 ns/op | 0.31200 ns/op | 1.21 |
| Map access 1 prop | 0.33000 ns/op | 0.28800 ns/op | 1.15 |
| Object get x1000 | 20.805 ns/op | 17.891 ns/op | 1.16 |
| Map get x1000 | 1.1450 ns/op | 0.98300 ns/op | 1.16 |
| Object set x1000 | 128.17 ns/op | 98.363 ns/op | 1.30 |
| Map set x1000 | 74.763 ns/op | 58.384 ns/op | 1.28 |
| Return object 10000 times | 0.44480 ns/op | 0.36960 ns/op | 1.20 |
| Throw Error 10000 times | 7.0932 us/op | 5.9089 us/op | 1.20 |
| enrSubnets - fastDeserialize 64 bits | 1.3960 us/op | 1.1420 us/op | 1.22 |
| enrSubnets - ssz BitVector 64 bits | 19.975 us/op | 16.416 us/op | 1.22 |
| enrSubnets - fastDeserialize 4 bits | 556.00 ns/op | 402.00 ns/op | 1.38 |
| enrSubnets - ssz BitVector 4 bits | 3.6320 us/op | 2.7980 us/op | 1.30 |
| RateTracker 1000000 limit, 1 obj count per request | 215.54 ns/op | 172.55 ns/op | 1.25 |
| RateTracker 1000000 limit, 2 obj count per request | 159.08 ns/op | 127.50 ns/op | 1.25 |
| RateTracker 1000000 limit, 4 obj count per request | 133.08 ns/op | 104.74 ns/op | 1.27 |
| RateTracker 1000000 limit, 8 obj count per request | 119.90 ns/op | 93.562 ns/op | 1.28 |
| RateTracker with prune | 4.4650 us/op | 3.3710 us/op | 1.32 |
by benchmarkbot/action
Codecov Report
Merging #3669 (7928f92) into master (a00ec5c) will increase coverage by
0.25%. The diff coverage isn/a.
@@ Coverage Diff @@
## master #3669 +/- ##
==========================================
+ Coverage 37.13% 37.39% +0.25%
==========================================
Files 321 322 +1
Lines 8706 8796 +90
Branches 1350 1369 +19
==========================================
+ Hits 3233 3289 +56
- Misses 5330 5365 +35
+ Partials 143 142 -1
Closing for now since the network code has diverged. This optimization is good and should definitely be included on a future review of backfill sync