lodestar icon indicating copy to clipboard operation
lodestar copied to clipboard

feat: async shuffling refactor

Open matthewkeil opened this issue 11 months ago • 2 comments

** NOTE: Note ready for review, but want to trigger CI **

Motivation

Move calculation of next shuffling to async to get it off of critical path during epoch transition. There is a full second during epoch transition used to calculate the epochCtx.nextShuffling and that can be moved to an async process. Refactored a few pieces of the EpochCache to make this work and will continue this by creating a worker that moves this calculation to a worker thread. By using a worker thread that is tuned down with NICE we can interleave the long calculation into thread idle time which is ideal. To be continued...

Description

  • Change how shufflings are built/cached. Original method was to build on the epochCtx and then to processState to move them to the ShufflingCache. Cleaned up that flow a bit to build/store the shufflings directly in the ShufflingCache
  • Moved full shufflings off the ShufflingCache and stored only the pieces we were using directly (length of activeValidators and the epoch numbers)
  • Move ShufflingCache from beacon-node to state-transition
  • Pass logger into EpochCache so its available for debugging issues with shuffling builds

matthewkeil avatar Mar 08 '24 09:03 matthewkeil

Performance Report

✔️ no performance regression detected

🚀🚀 Significant benchmark improvement detected

Benchmark suite Current: b6139610b73c2e5e949fa6dbc5d1bc56ea6ccc53 Previous: adc0534782436ee45614968c090915f0724121e1 Ratio
phase0 afterProcessEpoch - 250000 vs - 7PWei 9.5258 ms/op 112.14 ms/op 0.08
Full benchmark results
Benchmark suite Current: b6139610b73c2e5e949fa6dbc5d1bc56ea6ccc53 Previous: adc0534782436ee45614968c090915f0724121e1 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 808.69 us/op 792.50 us/op 1.02
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 166.51 us/op 83.873 us/op 1.99
BLS verify - blst-native 1.4338 ms/op 1.3337 ms/op 1.08
BLS verifyMultipleSignatures 3 - blst-native 3.4845 ms/op 2.7201 ms/op 1.28
BLS verifyMultipleSignatures 8 - blst-native 6.7507 ms/op 6.0052 ms/op 1.12
BLS verifyMultipleSignatures 32 - blst-native 28.130 ms/op 21.969 ms/op 1.28
BLS verifyMultipleSignatures 64 - blst-native 62.987 ms/op 43.166 ms/op 1.46
BLS verifyMultipleSignatures 128 - blst-native 108.76 ms/op 86.357 ms/op 1.26
BLS deserializing 10000 signatures 998.97 ms/op 924.28 ms/op 1.08
BLS deserializing 100000 signatures 10.418 s/op 9.4577 s/op 1.10
BLS verifyMultipleSignatures - same message - 3 - blst-native 1.4400 ms/op 1.3320 ms/op 1.08
BLS verifyMultipleSignatures - same message - 8 - blst-native 1.6733 ms/op 1.6456 ms/op 1.02
BLS verifyMultipleSignatures - same message - 32 - blst-native 2.4824 ms/op 2.9228 ms/op 0.85
BLS verifyMultipleSignatures - same message - 64 - blst-native 3.7327 ms/op 4.4102 ms/op 0.85
BLS verifyMultipleSignatures - same message - 128 - blst-native 6.1825 ms/op 7.9726 ms/op 0.78
BLS aggregatePubkeys 32 - blst-native 28.600 us/op 25.918 us/op 1.10
BLS aggregatePubkeys 128 - blst-native 109.62 us/op 100.81 us/op 1.09
notSeenSlots=1 numMissedVotes=1 numBadVotes=10 111.68 ms/op 67.440 ms/op 1.66
notSeenSlots=1 numMissedVotes=0 numBadVotes=4 110.87 ms/op 63.667 ms/op 1.74
notSeenSlots=2 numMissedVotes=1 numBadVotes=10 61.930 ms/op 36.440 ms/op 1.70
getSlashingsAndExits - default max 235.53 us/op 203.78 us/op 1.16
getSlashingsAndExits - 2k 643.29 us/op 651.26 us/op 0.99
proposeBlockBody type=full, size=empty 5.8215 ms/op 5.3843 ms/op 1.08
isKnown best case - 1 super set check 404.00 ns/op 379.00 ns/op 1.07
isKnown normal case - 2 super set checks 330.00 ns/op 532.00 ns/op 0.62
isKnown worse case - 16 super set checks 326.00 ns/op 599.00 ns/op 0.54
CheckpointStateCache - add get delete 6.8540 us/op 7.6150 us/op 0.90
validate api signedAggregateAndProof - struct 2.8864 ms/op 3.0116 ms/op 0.96
validate gossip signedAggregateAndProof - struct 2.8918 ms/op 2.8203 ms/op 1.03
validate gossip attestation - vc 640000 1.3730 ms/op 1.3874 ms/op 0.99
batch validate gossip attestation - vc 640000 - chunk 32 159.38 us/op 168.47 us/op 0.95
batch validate gossip attestation - vc 640000 - chunk 64 140.72 us/op 146.72 us/op 0.96
batch validate gossip attestation - vc 640000 - chunk 128 143.82 us/op 141.39 us/op 1.02
batch validate gossip attestation - vc 640000 - chunk 256 148.50 us/op 130.31 us/op 1.14
pickEth1Vote - no votes 1.4879 ms/op 1.1663 ms/op 1.28
pickEth1Vote - max votes 14.622 ms/op 9.8931 ms/op 1.48
pickEth1Vote - Eth1Data hashTreeRoot value x2048 22.139 ms/op 16.455 ms/op 1.35
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 27.896 ms/op 23.089 ms/op 1.21
pickEth1Vote - Eth1Data fastSerialize value x2048 759.74 us/op 620.45 us/op 1.22
pickEth1Vote - Eth1Data fastSerialize tree x2048 5.8967 ms/op 4.3950 ms/op 1.34
bytes32 toHexString 785.00 ns/op 532.00 ns/op 1.48
bytes32 Buffer.toString(hex) 349.00 ns/op 295.00 ns/op 1.18
bytes32 Buffer.toString(hex) from Uint8Array 568.00 ns/op 428.00 ns/op 1.33
bytes32 Buffer.toString(hex) + 0x 330.00 ns/op 292.00 ns/op 1.13
Object access 1 prop 0.21300 ns/op 0.16800 ns/op 1.27
Map access 1 prop 0.15800 ns/op 0.15400 ns/op 1.03
Object get x1000 8.1140 ns/op 7.3500 ns/op 1.10
Map get x1000 0.86400 ns/op 0.76700 ns/op 1.13
Object set x1000 64.244 ns/op 52.256 ns/op 1.23
Map set x1000 45.434 ns/op 41.214 ns/op 1.10
Return object 10000 times 0.25190 ns/op 0.24490 ns/op 1.03
Throw Error 10000 times 4.0930 us/op 3.9163 us/op 1.05
fastMsgIdFn sha256 / 200 bytes 3.4930 us/op 3.3970 us/op 1.03
fastMsgIdFn h32 xxhash / 200 bytes 379.00 ns/op 317.00 ns/op 1.20
fastMsgIdFn h64 xxhash / 200 bytes 379.00 ns/op 348.00 ns/op 1.09
fastMsgIdFn sha256 / 1000 bytes 11.863 us/op 11.370 us/op 1.04
fastMsgIdFn h32 xxhash / 1000 bytes 497.00 ns/op 417.00 ns/op 1.19
fastMsgIdFn h64 xxhash / 1000 bytes 483.00 ns/op 458.00 ns/op 1.05
fastMsgIdFn sha256 / 10000 bytes 107.76 us/op 104.97 us/op 1.03
fastMsgIdFn h32 xxhash / 10000 bytes 2.1350 us/op 1.9730 us/op 1.08
fastMsgIdFn h64 xxhash / 10000 bytes 1.4660 us/op 1.3830 us/op 1.06
send data - 1000 256B messages 22.046 ms/op 19.901 ms/op 1.11
send data - 1000 512B messages 33.079 ms/op 28.055 ms/op 1.18
send data - 1000 1024B messages 43.995 ms/op 41.059 ms/op 1.07
send data - 1000 1200B messages 43.890 ms/op 37.236 ms/op 1.18
send data - 1000 2048B messages 55.579 ms/op 48.863 ms/op 1.14
send data - 1000 4096B messages 44.914 ms/op 44.281 ms/op 1.01
send data - 1000 16384B messages 133.09 ms/op 117.00 ms/op 1.14
send data - 1000 65536B messages 531.63 ms/op 471.40 ms/op 1.13
enrSubnets - fastDeserialize 64 bits 1.5610 us/op 1.3310 us/op 1.17
enrSubnets - ssz BitVector 64 bits 665.00 ns/op 445.00 ns/op 1.49
enrSubnets - fastDeserialize 4 bits 263.00 ns/op 196.00 ns/op 1.34
enrSubnets - ssz BitVector 4 bits 656.00 ns/op 466.00 ns/op 1.41
prioritizePeers score -10:0 att 32-0.1 sync 2-0 123.50 us/op 104.86 us/op 1.18
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 157.90 us/op 132.87 us/op 1.19
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 218.62 us/op 175.69 us/op 1.24
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 354.18 us/op 297.62 us/op 1.19
prioritizePeers score 0:0 att 64-1 sync 4-1 401.29 us/op 368.83 us/op 1.09
array of 16000 items push then shift 1.8536 us/op 1.6256 us/op 1.14
LinkedList of 16000 items push then shift 10.465 ns/op 9.0490 ns/op 1.16
array of 16000 items push then pop 118.69 ns/op 59.223 ns/op 2.00
LinkedList of 16000 items push then pop 9.5370 ns/op 8.8570 ns/op 1.08
array of 24000 items push then shift 2.6889 us/op 2.4041 us/op 1.12
LinkedList of 24000 items push then shift 10.393 ns/op 8.8960 ns/op 1.17
array of 24000 items push then pop 165.55 ns/op 114.00 ns/op 1.45
LinkedList of 24000 items push then pop 9.9040 ns/op 8.7010 ns/op 1.14
intersect bitArray bitLen 8 6.4680 ns/op 5.7850 ns/op 1.12
intersect array and set length 8 83.520 ns/op 64.743 ns/op 1.29
intersect bitArray bitLen 128 38.499 ns/op 35.272 ns/op 1.09
intersect array and set length 128 1.1950 us/op 948.41 ns/op 1.26
bitArray.getTrueBitIndexes() bitLen 128 1.8030 us/op 1.5620 us/op 1.15
bitArray.getTrueBitIndexes() bitLen 248 3.2730 us/op 2.8920 us/op 1.13
bitArray.getTrueBitIndexes() bitLen 512 6.5630 us/op 5.2420 us/op 1.25
Buffer.concat 32 items 1.0480 us/op 1.0970 us/op 0.96
Uint8Array.set 32 items 2.4320 us/op 2.6580 us/op 0.91
Set add up to 64 items then delete first 5.4080 us/op 4.3039 us/op 1.26
OrderedSet add up to 64 items then delete first 7.1814 us/op 5.3629 us/op 1.34
Set add up to 64 items then delete last 5.7434 us/op 4.5334 us/op 1.27
OrderedSet add up to 64 items then delete last 7.5478 us/op 5.6186 us/op 1.34
Set add up to 64 items then delete middle 5.6896 us/op 4.4868 us/op 1.27
OrderedSet add up to 64 items then delete middle 9.1165 us/op 6.9857 us/op 1.31
Set add up to 128 items then delete first 11.788 us/op 9.4838 us/op 1.24
OrderedSet add up to 128 items then delete first 16.257 us/op 12.161 us/op 1.34
Set add up to 128 items then delete last 11.709 us/op 9.1715 us/op 1.28
OrderedSet add up to 128 items then delete last 15.139 us/op 11.275 us/op 1.34
Set add up to 128 items then delete middle 11.587 us/op 9.0236 us/op 1.28
OrderedSet add up to 128 items then delete middle 21.629 us/op 16.624 us/op 1.30
Set add up to 256 items then delete first 23.837 us/op 18.709 us/op 1.27
OrderedSet add up to 256 items then delete first 32.559 us/op 24.900 us/op 1.31
Set add up to 256 items then delete last 22.664 us/op 17.899 us/op 1.27
OrderedSet add up to 256 items then delete last 31.402 us/op 22.785 us/op 1.38
Set add up to 256 items then delete middle 22.998 us/op 18.059 us/op 1.27
OrderedSet add up to 256 items then delete middle 55.379 us/op 44.531 us/op 1.24
transfer serialized Status (84 B) 2.2230 us/op 1.6270 us/op 1.37
copy serialized Status (84 B) 1.5360 us/op 1.1970 us/op 1.28
transfer serialized SignedVoluntaryExit (112 B) 2.3030 us/op 1.8140 us/op 1.27
copy serialized SignedVoluntaryExit (112 B) 1.5360 us/op 1.2770 us/op 1.20
transfer serialized ProposerSlashing (416 B) 2.4800 us/op 2.8560 us/op 0.87
copy serialized ProposerSlashing (416 B) 2.4210 us/op 2.6770 us/op 0.90
transfer serialized Attestation (485 B) 3.1420 us/op 2.6360 us/op 1.19
copy serialized Attestation (485 B) 2.5720 us/op 2.4990 us/op 1.03
transfer serialized AttesterSlashing (33232 B) 3.7200 us/op 2.4860 us/op 1.50
copy serialized AttesterSlashing (33232 B) 10.398 us/op 6.3920 us/op 1.63
transfer serialized Small SignedBeaconBlock (128000 B) 4.8740 us/op 2.8310 us/op 1.72
copy serialized Small SignedBeaconBlock (128000 B) 28.665 us/op 15.038 us/op 1.91
transfer serialized Avg SignedBeaconBlock (200000 B) 5.1470 us/op 3.3940 us/op 1.52
copy serialized Avg SignedBeaconBlock (200000 B) 43.891 us/op 20.602 us/op 2.13
transfer serialized BlobsSidecar (524380 B) 5.2130 us/op 3.4700 us/op 1.50
copy serialized BlobsSidecar (524380 B) 114.01 us/op 120.55 us/op 0.95
transfer serialized Big SignedBeaconBlock (1000000 B) 5.4320 us/op 3.0400 us/op 1.79
copy serialized Big SignedBeaconBlock (1000000 B) 236.26 us/op 380.64 us/op 0.62
pass gossip attestations to forkchoice per slot 7.1293 ms/op 3.7688 ms/op 1.89
forkChoice updateHead vc 100000 bc 64 eq 0 763.00 us/op 672.05 us/op 1.14
forkChoice updateHead vc 600000 bc 64 eq 0 6.2779 ms/op 4.0664 ms/op 1.54
forkChoice updateHead vc 1000000 bc 64 eq 0 8.6348 ms/op 6.8946 ms/op 1.25
forkChoice updateHead vc 600000 bc 320 eq 0 4.8957 ms/op 4.1540 ms/op 1.18
forkChoice updateHead vc 600000 bc 1200 eq 0 5.1534 ms/op 4.2837 ms/op 1.20
forkChoice updateHead vc 600000 bc 7200 eq 0 6.1694 ms/op 5.3699 ms/op 1.15
forkChoice updateHead vc 600000 bc 64 eq 1000 12.005 ms/op 10.915 ms/op 1.10
forkChoice updateHead vc 600000 bc 64 eq 10000 13.295 ms/op 11.636 ms/op 1.14
forkChoice updateHead vc 600000 bc 64 eq 300000 22.280 ms/op 15.467 ms/op 1.44
computeDeltas 500000 validators 300 proto nodes 6.8713 ms/op 6.6073 ms/op 1.04
computeDeltas 500000 validators 1200 proto nodes 6.5748 ms/op 6.3694 ms/op 1.03
computeDeltas 500000 validators 7200 proto nodes 6.5105 ms/op 6.4834 ms/op 1.00
computeDeltas 750000 validators 300 proto nodes 10.230 ms/op 9.7722 ms/op 1.05
computeDeltas 750000 validators 1200 proto nodes 9.7887 ms/op 9.7933 ms/op 1.00
computeDeltas 750000 validators 7200 proto nodes 9.9933 ms/op 9.7276 ms/op 1.03
computeDeltas 1400000 validators 300 proto nodes 19.397 ms/op 17.968 ms/op 1.08
computeDeltas 1400000 validators 1200 proto nodes 19.377 ms/op 17.844 ms/op 1.09
computeDeltas 1400000 validators 7200 proto nodes 19.628 ms/op 17.858 ms/op 1.10
computeDeltas 2100000 validators 300 proto nodes 29.726 ms/op 26.869 ms/op 1.11
computeDeltas 2100000 validators 1200 proto nodes 28.949 ms/op 27.255 ms/op 1.06
computeDeltas 2100000 validators 7200 proto nodes 29.698 ms/op 26.350 ms/op 1.13
altair processAttestation - 250000 vs - 7PWei normalcase 2.7614 ms/op 2.9283 ms/op 0.94
altair processAttestation - 250000 vs - 7PWei worstcase 3.7242 ms/op 4.0120 ms/op 0.93
altair processAttestation - setStatus - 1/6 committees join 160.96 us/op 213.67 us/op 0.75
altair processAttestation - setStatus - 1/3 committees join 304.91 us/op 429.12 us/op 0.71
altair processAttestation - setStatus - 1/2 committees join 411.92 us/op 581.93 us/op 0.71
altair processAttestation - setStatus - 2/3 committees join 512.90 us/op 652.26 us/op 0.79
altair processAttestation - setStatus - 4/5 committees join 723.44 us/op 995.01 us/op 0.73
altair processAttestation - setStatus - 100% committees join 859.50 us/op 1.1058 ms/op 0.78
altair processBlock - 250000 vs - 7PWei normalcase 10.692 ms/op 7.9389 ms/op 1.35
altair processBlock - 250000 vs - 7PWei normalcase hashState 36.413 ms/op 34.314 ms/op 1.06
altair processBlock - 250000 vs - 7PWei worstcase 41.037 ms/op 38.807 ms/op 1.06
altair processBlock - 250000 vs - 7PWei worstcase hashState 119.33 ms/op 90.515 ms/op 1.32
phase0 processBlock - 250000 vs - 7PWei normalcase 3.2677 ms/op 2.8609 ms/op 1.14
phase0 processBlock - 250000 vs - 7PWei worstcase 34.232 ms/op 28.893 ms/op 1.18
altair processEth1Data - 250000 vs - 7PWei normalcase 709.33 us/op 476.37 us/op 1.49
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 16.786 us/op 7.4280 us/op 2.26
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 66.303 us/op 32.848 us/op 2.02
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 28.461 us/op 10.765 us/op 2.64
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 20.178 us/op 10.203 us/op 1.98
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 213.33 us/op 119.64 us/op 1.78
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 1.6168 ms/op 1.0326 ms/op 1.57
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 2.3482 ms/op 1.4912 ms/op 1.57
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 1.9789 ms/op 1.5262 ms/op 1.30
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 4.4760 ms/op 3.3999 ms/op 1.32
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 3.1252 ms/op 2.3292 ms/op 1.34
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 7.6084 ms/op 5.2056 ms/op 1.46
Tree 40 250000 create 406.37 ms/op 343.02 ms/op 1.18
Tree 40 250000 get(125000) 219.00 ns/op 193.49 ns/op 1.13
Tree 40 250000 set(125000) 1.0892 us/op 1.0295 us/op 1.06
Tree 40 250000 toArray() 23.353 ms/op 20.186 ms/op 1.16
Tree 40 250000 iterate all - toArray() + loop 25.100 ms/op 17.659 ms/op 1.42
Tree 40 250000 iterate all - get(i) 77.552 ms/op 64.553 ms/op 1.20
MutableVector 250000 create 17.905 ms/op 12.070 ms/op 1.48
MutableVector 250000 get(125000) 6.9500 ns/op 6.3850 ns/op 1.09
MutableVector 250000 set(125000) 320.65 ns/op 250.73 ns/op 1.28
MutableVector 250000 toArray() 4.0278 ms/op 2.7717 ms/op 1.45
MutableVector 250000 iterate all - toArray() + loop 4.1412 ms/op 2.8871 ms/op 1.43
MutableVector 250000 iterate all - get(i) 1.6019 ms/op 1.5245 ms/op 1.05
Array 250000 create 3.6988 ms/op 2.5386 ms/op 1.46
Array 250000 clone - spread 1.4953 ms/op 1.1837 ms/op 1.26
Array 250000 get(125000) 1.2350 ns/op 1.0230 ns/op 1.21
Array 250000 set(125000) 5.2550 ns/op 4.0410 ns/op 1.30
Array 250000 iterate all - loop 173.50 us/op 165.44 us/op 1.05
effectiveBalanceIncrements clone Uint8Array 300000 42.482 us/op 28.045 us/op 1.51
effectiveBalanceIncrements clone MutableVector 300000 455.00 ns/op 360.00 ns/op 1.26
effectiveBalanceIncrements rw all Uint8Array 300000 208.67 us/op 199.10 us/op 1.05
effectiveBalanceIncrements rw all MutableVector 300000 101.14 ms/op 81.252 ms/op 1.24
phase0 afterProcessEpoch - 250000 vs - 7PWei 9.5258 ms/op 112.14 ms/op 0.08
phase0 beforeProcessEpoch - 250000 vs - 7PWei 42.113 ms/op 50.768 ms/op 0.83
altair processEpoch - mainnet_e81889 394.36 ms/op 484.06 ms/op 0.81
mainnet_e81889 - altair beforeProcessEpoch 82.212 ms/op 81.149 ms/op 1.01
mainnet_e81889 - altair processJustificationAndFinalization 23.770 us/op 15.167 us/op 1.57
mainnet_e81889 - altair processInactivityUpdates 5.8206 ms/op 5.6592 ms/op 1.03
mainnet_e81889 - altair processRewardsAndPenalties 63.721 ms/op 39.039 ms/op 1.63
mainnet_e81889 - altair processRegistryUpdates 2.6900 us/op 2.3670 us/op 1.14
mainnet_e81889 - altair processSlashings 505.00 ns/op 490.00 ns/op 1.03
mainnet_e81889 - altair processEth1DataReset 603.00 ns/op 467.00 ns/op 1.29
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.0540 ms/op 1.4377 ms/op 1.43
mainnet_e81889 - altair processSlashingsReset 7.1900 us/op 3.3510 us/op 2.15
mainnet_e81889 - altair processRandaoMixesReset 7.3850 us/op 4.6060 us/op 1.60
mainnet_e81889 - altair processHistoricalRootsUpdate 1.7070 us/op 675.00 ns/op 2.53
mainnet_e81889 - altair processParticipationFlagUpdates 2.7490 us/op 3.2150 us/op 0.86
mainnet_e81889 - altair processSyncCommitteeUpdates 594.00 ns/op 668.00 ns/op 0.89
mainnet_e81889 - altair afterProcessEpoch 9.1597 ms/op 115.69 ms/op 0.08
capella processEpoch - mainnet_e217614 1.8865 s/op 1.7714 s/op 1.06
mainnet_e217614 - capella beforeProcessEpoch 480.08 ms/op 452.12 ms/op 1.06
mainnet_e217614 - capella processJustificationAndFinalization 19.675 us/op 17.060 us/op 1.15
mainnet_e217614 - capella processInactivityUpdates 22.381 ms/op 22.958 ms/op 0.97
mainnet_e217614 - capella processRewardsAndPenalties 467.38 ms/op 476.85 ms/op 0.98
mainnet_e217614 - capella processRegistryUpdates 42.298 us/op 22.317 us/op 1.90
mainnet_e217614 - capella processSlashings 1.0850 us/op 451.00 ns/op 2.41
mainnet_e217614 - capella processEth1DataReset 775.00 ns/op 537.00 ns/op 1.44
mainnet_e217614 - capella processEffectiveBalanceUpdates 4.7199 ms/op 5.4559 ms/op 0.87
mainnet_e217614 - capella processSlashingsReset 6.5550 us/op 3.2850 us/op 2.00
mainnet_e217614 - capella processRandaoMixesReset 7.9070 us/op 5.2710 us/op 1.50
mainnet_e217614 - capella processHistoricalRootsUpdate 1.2360 us/op 602.00 ns/op 2.05
mainnet_e217614 - capella processParticipationFlagUpdates 2.2410 us/op 4.5950 us/op 0.49
mainnet_e217614 - capella afterProcessEpoch 8.5972 ms/op 307.65 ms/op 0.03
phase0 processEpoch - mainnet_e58758 444.59 ms/op 516.91 ms/op 0.86
mainnet_e58758 - phase0 beforeProcessEpoch 137.45 ms/op 144.26 ms/op 0.95
mainnet_e58758 - phase0 processJustificationAndFinalization 25.371 us/op 16.297 us/op 1.56
mainnet_e58758 - phase0 processRewardsAndPenalties 64.070 ms/op 53.819 ms/op 1.19
mainnet_e58758 - phase0 processRegistryUpdates 15.836 us/op 9.7940 us/op 1.62
mainnet_e58758 - phase0 processSlashings 654.00 ns/op 635.00 ns/op 1.03
mainnet_e58758 - phase0 processEth1DataReset 691.00 ns/op 816.00 ns/op 0.85
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.1403 ms/op 1.1929 ms/op 1.79
mainnet_e58758 - phase0 processSlashingsReset 4.1980 us/op 2.4360 us/op 1.72
mainnet_e58758 - phase0 processRandaoMixesReset 6.2250 us/op 4.2400 us/op 1.47
mainnet_e58758 - phase0 processHistoricalRootsUpdate 659.00 ns/op 606.00 ns/op 1.09
mainnet_e58758 - phase0 processParticipationRecordUpdates 6.0560 us/op 4.9480 us/op 1.22
mainnet_e58758 - phase0 afterProcessEpoch 8.3624 ms/op 101.74 ms/op 0.08
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.5963 ms/op 1.3767 ms/op 1.89
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.4664 ms/op 1.5362 ms/op 0.95
altair processInactivityUpdates - 250000 normalcase 34.293 ms/op 32.523 ms/op 1.05
altair processInactivityUpdates - 250000 worstcase 34.463 ms/op 24.252 ms/op 1.42
phase0 processRegistryUpdates - 250000 normalcase 12.870 us/op 14.537 us/op 0.89
phase0 processRegistryUpdates - 250000 badcase_full_deposits 630.07 us/op 462.05 us/op 1.36
phase0 processRegistryUpdates - 250000 worstcase 0.5 144.28 ms/op 142.86 ms/op 1.01
altair processRewardsAndPenalties - 250000 normalcase 67.482 ms/op 65.509 ms/op 1.03
altair processRewardsAndPenalties - 250000 worstcase 67.273 ms/op 61.436 ms/op 1.10
phase0 getAttestationDeltas - 250000 normalcase 9.0912 ms/op 10.792 ms/op 0.84
phase0 getAttestationDeltas - 250000 worstcase 8.8824 ms/op 9.9359 ms/op 0.89
phase0 processSlashings - 250000 worstcase 131.90 us/op 97.945 us/op 1.35
altair processSyncCommitteeUpdates - 250000 149.24 ms/op 161.07 ms/op 0.93
BeaconState.hashTreeRoot - No change 247.00 ns/op 371.00 ns/op 0.67
BeaconState.hashTreeRoot - 1 full validator 147.89 us/op 122.53 us/op 1.21
BeaconState.hashTreeRoot - 32 full validator 1.6378 ms/op 1.1514 ms/op 1.42
BeaconState.hashTreeRoot - 512 full validator 16.635 ms/op 14.558 ms/op 1.14
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 153.47 us/op 183.32 us/op 0.84
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 2.2762 ms/op 2.1488 ms/op 1.06
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 35.254 ms/op 33.190 ms/op 1.06
BeaconState.hashTreeRoot - 1 balances 146.39 us/op 135.09 us/op 1.08
BeaconState.hashTreeRoot - 32 balances 1.2142 ms/op 1.2129 ms/op 1.00
BeaconState.hashTreeRoot - 512 balances 13.828 ms/op 13.793 ms/op 1.00
BeaconState.hashTreeRoot - 250000 balances 227.79 ms/op 225.18 ms/op 1.01
aggregationBits - 2048 els - zipIndexesInBitList 25.700 us/op 70.920 us/op 0.36
byteArrayEquals 32 74.486 ns/op 75.090 ns/op 0.99
Buffer.compare 32 55.504 ns/op 55.817 ns/op 0.99
byteArrayEquals 1024 2.0428 us/op 2.0457 us/op 1.00
Buffer.compare 1024 72.694 ns/op 70.502 ns/op 1.03
byteArrayEquals 16384 32.563 us/op 32.557 us/op 1.00
Buffer.compare 16384 252.88 ns/op 270.28 ns/op 0.94
byteArrayEquals 123687377 242.78 ms/op 252.72 ms/op 0.96
Buffer.compare 123687377 6.3092 ms/op 8.5285 ms/op 0.74
byteArrayEquals 32 - diff last byte 72.437 ns/op 74.156 ns/op 0.98
Buffer.compare 32 - diff last byte 56.371 ns/op 57.229 ns/op 0.99
byteArrayEquals 1024 - diff last byte 2.0634 us/op 2.6518 us/op 0.78
Buffer.compare 1024 - diff last byte 73.310 ns/op 81.031 ns/op 0.90
byteArrayEquals 16384 - diff last byte 33.345 us/op 33.825 us/op 0.99
Buffer.compare 16384 - diff last byte 281.38 ns/op 254.75 ns/op 1.10
byteArrayEquals 123687377 - diff last byte 249.42 ms/op 257.30 ms/op 0.97
Buffer.compare 123687377 - diff last byte 6.8335 ms/op 6.9196 ms/op 0.99
byteArrayEquals 32 - random bytes 5.5360 ns/op 5.3910 ns/op 1.03
Buffer.compare 32 - random bytes 62.638 ns/op 62.462 ns/op 1.00
byteArrayEquals 1024 - random bytes 5.2600 ns/op 5.2140 ns/op 1.01
Buffer.compare 1024 - random bytes 61.120 ns/op 60.655 ns/op 1.01
byteArrayEquals 16384 - random bytes 5.2440 ns/op 5.1710 ns/op 1.01
Buffer.compare 16384 - random bytes 62.760 ns/op 60.324 ns/op 1.04
byteArrayEquals 123687377 - random bytes 8.6000 ns/op 8.4300 ns/op 1.02
Buffer.compare 123687377 - random bytes 67.050 ns/op 63.500 ns/op 1.06
regular array get 100000 times 45.530 us/op 43.936 us/op 1.04
wrappedArray get 100000 times 45.099 us/op 44.778 us/op 1.01
arrayWithProxy get 100000 times 15.670 ms/op 14.936 ms/op 1.05
ssz.Root.equals 55.199 ns/op 54.392 ns/op 1.01
byteArrayEquals 54.358 ns/op 54.348 ns/op 1.00
Buffer.compare 11.047 ns/op 11.401 ns/op 0.97
shuffle list - 16384 els 8.6635 ms/op 8.6133 ms/op 1.01
shuffle list - 250000 els 130.68 ms/op 124.99 ms/op 1.05
processSlot - 1 slots 16.520 us/op 17.420 us/op 0.95
processSlot - 32 slots 4.1946 ms/op 3.3153 ms/op 1.27
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 64.553 ms/op 58.732 ms/op 1.10
getCommitteeAssignments - req 1 vs - 250000 vc 2.7080 ms/op 2.6537 ms/op 1.02
getCommitteeAssignments - req 100 vs - 250000 vc 3.9087 ms/op 3.8348 ms/op 1.02
getCommitteeAssignments - req 1000 vs - 250000 vc 4.2717 ms/op 4.1876 ms/op 1.02
findModifiedValidators - 10000 modified validators 520.55 ms/op 556.14 ms/op 0.94
findModifiedValidators - 1000 modified validators 426.48 ms/op 385.66 ms/op 1.11
findModifiedValidators - 100 modified validators 395.75 ms/op 415.96 ms/op 0.95
findModifiedValidators - 10 modified validators 395.42 ms/op 394.53 ms/op 1.00
findModifiedValidators - 1 modified validators 415.29 ms/op 399.70 ms/op 1.04
findModifiedValidators - no difference 412.30 ms/op 410.56 ms/op 1.00
compare ViewDUs 4.9412 s/op 4.2832 s/op 1.15
compare each validator Uint8Array 1.7897 s/op 1.5276 s/op 1.17
compare ViewDU to Uint8Array 1.4221 s/op 1.0780 s/op 1.32
migrate state 1000000 validators, 24 modified, 0 new 883.00 ms/op 787.74 ms/op 1.12
migrate state 1000000 validators, 1700 modified, 1000 new 1.1843 s/op 1.0623 s/op 1.11
migrate state 1000000 validators, 3400 modified, 2000 new 1.5765 s/op 1.2952 s/op 1.22
migrate state 1500000 validators, 24 modified, 0 new 1.0120 s/op 776.07 ms/op 1.30
migrate state 1500000 validators, 1700 modified, 1000 new 1.2797 s/op 1.0834 s/op 1.18
migrate state 1500000 validators, 3400 modified, 2000 new 1.6850 s/op 1.3105 s/op 1.29
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 5.5100 ns/op 4.2100 ns/op 1.31
state getBlockRootAtSlot - 250000 vs - 7PWei 791.59 ns/op 615.59 ns/op 1.29
computeProposers - vc 250000 11.234 ms/op 8.6634 ms/op 1.30
computeEpochShuffling - vc 250000 141.61 ms/op 122.81 ms/op 1.15
getNextSyncCommittee - vc 250000 178.07 ms/op 159.66 ms/op 1.12
computeSigningRoot for AttestationData 31.590 us/op 28.031 us/op 1.13
hash AttestationData serialized data then Buffer.toString(base64) 2.5710 us/op 2.2450 us/op 1.15
toHexString serialized data 1.6991 us/op 1.0674 us/op 1.59
Buffer.toString(base64) 289.51 ns/op 212.93 ns/op 1.36

by benchmarkbot/action

github-actions[bot] avatar Mar 08 '24 10:03 github-actions[bot]

this PR is not aligned with the high level design stated in in #6386 where it's recommended to move shuffling from state-transition to beacon-node. Some benefits of that approach:

  • beacon-node is the consumer of shuffling, it should just use the current ShufflingCache there, enhance if needed
  • we want to keep state-transition simple with no async/await
  • also it's more convenient to implement offloading next shuffling computation in beacon-node, there's already a couple of worker implementations there

twoeths avatar Mar 27 '24 04:03 twoeths