ssz icon indicating copy to clipboard operation
ssz copied to clipboard

feat: SIMD implementation for as-sha256

Open twoeths opened this issue 1 year ago • 2 comments

Motivation

SIMD is available in assemblyscript, it supports v128 data structure which mean we can hash 4 inputs in parallel

Description

  • New assemblyscript simd implementation in assembly/simd.ts
  • New methods to support hashing 4 inputs (each 64 bytes) in parallel:
    • hash4Input64s(inputs: Uint8Array[]): Uint8Array[]
    • hash8HashObjects(inputs: HashObject[])
  • Add unit tests and benchmarks

Closes #356

twoeths avatar Apr 15 '24 08:04 twoeths

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 81334811864067a6af798af4f015dbcb1b99779d Previous: cf8f04905e8d93fd44e59a319e1431f5b932d0e3 Ratio
digestTwoHashObjects 50023 times 47.923 ms/op 47.926 ms/op 1.00
digest64 50023 times 50.469 ms/op 50.930 ms/op 0.99
digest 50023 times 52.153 ms/op 52.992 ms/op 0.98
input length 32 1.2030 us/op 1.1920 us/op 1.01
input length 64 1.3590 us/op 1.3970 us/op 0.97
input length 128 2.2660 us/op 2.3880 us/op 0.95
input length 256 3.3830 us/op 3.4430 us/op 0.98
input length 512 5.5630 us/op 5.6190 us/op 0.99
input length 1024 10.707 us/op 10.763 us/op 0.99
digest 1000000 times 824.19 ms/op 837.14 ms/op 0.98
hashObjectToByteArray 50023 times 1.4283 ms/op 1.4692 ms/op 0.97
byteArrayToHashObject 50023 times 2.4242 ms/op 2.4603 ms/op 0.99
digest64 200092 times 206.57 ms/op
hash 200092 times using batchHash4UintArray64s 212.05 ms/op
hash 200092 times using batchHash4HashObjectInputs 212.59 ms/op
getGindicesAtDepth 4.6080 us/op 4.6690 us/op 0.99
iterateAtDepth 7.2810 us/op 7.4530 us/op 0.98
getGindexBits 428.00 ns/op 430.00 ns/op 1.00
gindexIterator 1.0290 us/op 972.00 ns/op 1.06
hash 2 Uint8Array 2250026 times - as-sha256 2.3156 s/op 2.3533 s/op 0.98
hashTwoObjects 2250026 times - as-sha256 2.1663 s/op 2.2222 s/op 0.97
hash 2 Uint8Array 2250026 times - noble 5.0159 s/op 5.2452 s/op 0.96
hashTwoObjects 2250026 times - noble 6.8932 s/op 6.8410 s/op 1.01
getNodeH() x7812.5 avg hindex 12.143 us/op 12.969 us/op 0.94
getNodeH() x7812.5 index 0 6.3680 us/op 6.6040 us/op 0.96
getNodeH() x7812.5 index 7 6.4100 us/op 6.5780 us/op 0.97
getNodeH() x7812.5 index 7 with key array 6.3800 us/op 6.4950 us/op 0.98
new LeafNode() x7812.5 14.760 us/op 15.032 us/op 0.98
multiproof - depth 15, 1 requested leaves 8.6070 us/op 9.6410 us/op 0.89
tree offset multiproof - depth 15, 1 requested leaves 19.633 us/op 20.563 us/op 0.95
compact multiproof - depth 15, 1 requested leaves 3.7230 us/op 5.4290 us/op 0.69
multiproof - depth 15, 2 requested leaves 11.534 us/op 12.903 us/op 0.89
tree offset multiproof - depth 15, 2 requested leaves 21.439 us/op 23.655 us/op 0.91
compact multiproof - depth 15, 2 requested leaves 3.4330 us/op 4.4640 us/op 0.77
multiproof - depth 15, 3 requested leaves 16.153 us/op 18.176 us/op 0.89
tree offset multiproof - depth 15, 3 requested leaves 27.953 us/op 29.919 us/op 0.93
compact multiproof - depth 15, 3 requested leaves 4.1860 us/op 6.4790 us/op 0.65
multiproof - depth 15, 4 requested leaves 21.466 us/op 23.370 us/op 0.92
tree offset multiproof - depth 15, 4 requested leaves 33.883 us/op 36.995 us/op 0.92
compact multiproof - depth 15, 4 requested leaves 5.0580 us/op 5.3080 us/op 0.95
packedRootsBytesToLeafNodes bytes 4000 offset 0 1.9560 us/op 1.9930 us/op 0.98
packedRootsBytesToLeafNodes bytes 4000 offset 1 1.9810 us/op 2.0020 us/op 0.99
packedRootsBytesToLeafNodes bytes 4000 offset 2 1.9630 us/op 2.0000 us/op 0.98
packedRootsBytesToLeafNodes bytes 4000 offset 3 1.8760 us/op 1.9940 us/op 0.94
subtreeFillToContents depth 40 count 250000 46.530 ms/op 45.958 ms/op 1.01
setRoot - gindexBitstring 8.1636 ms/op 8.4206 ms/op 0.97
setRoot - gindex 8.5065 ms/op 8.7619 ms/op 0.97
getRoot - gindexBitstring 2.4350 ms/op 2.4504 ms/op 0.99
getRoot - gindex 3.3562 ms/op 3.3620 ms/op 1.00
getHashObject then setHashObject 10.247 ms/op 10.481 ms/op 0.98
setNodeWithFn 7.9182 ms/op 8.0530 ms/op 0.98
getNodeAtDepth depth 0 x100000 1.0832 ms/op 1.0852 ms/op 1.00
setNodeAtDepth depth 0 x100000 2.3466 ms/op 2.4234 ms/op 0.97
getNodesAtDepth depth 0 x100000 1.0524 ms/op 1.0538 ms/op 1.00
setNodesAtDepth depth 0 x100000 1.4245 ms/op 1.4528 ms/op 0.98
getNodeAtDepth depth 1 x100000 1.1464 ms/op 1.1686 ms/op 0.98
setNodeAtDepth depth 1 x100000 5.1183 ms/op 5.1398 ms/op 1.00
getNodesAtDepth depth 1 x100000 1.1763 ms/op 1.1909 ms/op 0.99
setNodesAtDepth depth 1 x100000 4.3033 ms/op 4.3132 ms/op 1.00
getNodeAtDepth depth 2 x100000 1.4276 ms/op 1.4221 ms/op 1.00
setNodeAtDepth depth 2 x100000 8.7806 ms/op 10.417 ms/op 0.84
getNodesAtDepth depth 2 x100000 16.869 ms/op 18.389 ms/op 0.92
setNodesAtDepth depth 2 x100000 12.381 ms/op 12.926 ms/op 0.96
tree.getNodesAtDepth - gindexes 7.7827 ms/op 8.0320 ms/op 0.97
tree.getNodesAtDepth - push all nodes 1.9585 ms/op 1.9345 ms/op 1.01
tree.getNodesAtDepth - navigation 233.92 us/op 235.57 us/op 0.99
tree.setNodesAtDepth - indexes 349.98 us/op 308.89 us/op 1.13
set at depth 8 443.00 ns/op 450.00 ns/op 0.98
set at depth 16 588.00 ns/op 596.00 ns/op 0.99
set at depth 32 951.00 ns/op 958.00 ns/op 0.99
iterateNodesAtDepth 8 256 13.080 us/op 13.212 us/op 0.99
getNodesAtDepth 8 256 3.4390 us/op 3.3790 us/op 1.02
iterateNodesAtDepth 16 65536 4.2388 ms/op 4.3308 ms/op 0.98
getNodesAtDepth 16 65536 1.5835 ms/op 1.6273 ms/op 0.97
iterateNodesAtDepth 32 250000 15.410 ms/op 15.634 ms/op 0.99
getNodesAtDepth 32 250000 4.3000 ms/op 4.3522 ms/op 0.99
iterateNodesAtDepth 40 250000 15.540 ms/op 15.708 ms/op 0.99
getNodesAtDepth 40 250000 4.3836 ms/op 4.4330 ms/op 0.99
250k validators 7.1398 s/op 7.1114 s/op 1.00
bitlist bytes to struct (120,90) 482.00 ns/op 484.00 ns/op 1.00
bitlist bytes to tree (120,90) 2.1360 us/op 2.1460 us/op 1.00
bitlist bytes to struct (2048,2048) 911.00 ns/op 922.00 ns/op 0.99
bitlist bytes to tree (2048,2048) 3.3240 us/op 3.3630 us/op 0.99
ByteListType - deserialize 7.8165 ms/op 7.3046 ms/op 1.07
BasicListType - deserialize 11.857 ms/op 11.915 ms/op 1.00
ByteListType - serialize 7.8777 ms/op 7.9004 ms/op 1.00
BasicListType - serialize 9.6364 ms/op 10.023 ms/op 0.96
BasicListType - tree_convertToStruct 22.355 ms/op 22.655 ms/op 0.99
List[uint8, 68719476736] len 300000 ViewDU.getAll() + iterate 4.3003 ms/op 4.4147 ms/op 0.97
List[uint8, 68719476736] len 300000 ViewDU.get(i) 4.1212 ms/op 2.9512 ms/op 1.40
Array.push len 300000 empty Array - number 6.3746 ms/op 6.2896 ms/op 1.01
Array.set len 300000 from new Array - number 1.6630 ms/op 1.7071 ms/op 0.97
Array.set len 300000 - number 5.2218 ms/op 5.2257 ms/op 1.00
Uint8Array.set len 300000 373.14 us/op 372.38 us/op 1.00
Uint32Array.set len 300000 443.43 us/op 445.15 us/op 1.00
Container({a: uint8, b: uint8}) getViewDU x300000 52.403 ms/op 49.804 ms/op 1.05
ContainerNodeStruct({a: uint8, b: uint8}) getViewDU x300000 10.700 ms/op 10.834 ms/op 0.99
List(Container) len 300000 ViewDU.getAllReadonly() + iterate 208.75 ms/op 209.73 ms/op 1.00
List(Container) len 300000 ViewDU.getAllReadonlyValues() + iterate 316.36 ms/op 273.31 ms/op 1.16
List(Container) len 300000 ViewDU.get(i) 8.7640 ms/op 6.3717 ms/op 1.38
List(Container) len 300000 ViewDU.getReadonly(i) 8.1774 ms/op 6.3376 ms/op 1.29
List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonly() + iterate 40.470 ms/op 41.496 ms/op 0.98
List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonlyValues() + iterate 5.6273 ms/op 5.1590 ms/op 1.09
List(ContainerNodeStruct) len 300000 ViewDU.get(i) 7.2073 ms/op 5.9948 ms/op 1.20
List(ContainerNodeStruct) len 300000 ViewDU.getReadonly(i) 7.1238 ms/op 5.9572 ms/op 1.20
Array.push len 300000 empty Array - object 6.8128 ms/op 5.9218 ms/op 1.15
Array.set len 300000 from new Array - object 2.2630 ms/op 1.9831 ms/op 1.14
Array.set len 300000 - object 6.7586 ms/op 5.7016 ms/op 1.19
cachePermanentRootStruct no cache 9.2840 us/op 8.5850 us/op 1.08
cachePermanentRootStruct with cache 237.00 ns/op 188.00 ns/op 1.26
epochParticipation len 250000 rws 7813 2.3041 ms/op 1.8994 ms/op 1.21
deserialize Attestation - tree 4.5990 us/op 4.0490 us/op 1.14
deserialize Attestation - struct 2.0270 us/op 1.7750 us/op 1.14
deserialize SignedAggregateAndProof - tree 3.7370 us/op 3.6180 us/op 1.03
deserialize SignedAggregateAndProof - struct 3.1580 us/op 2.9150 us/op 1.08
deserialize SyncCommitteeMessage - tree 1.0770 us/op 1.0360 us/op 1.04
deserialize SyncCommitteeMessage - struct 1.1750 us/op 980.00 ns/op 1.20
deserialize SignedContributionAndProof - tree 2.1180 us/op 1.9690 us/op 1.08
deserialize SignedContributionAndProof - struct 2.5370 us/op 2.3590 us/op 1.08
deserialize SignedBeaconBlock - tree 238.34 us/op 208.32 us/op 1.14
deserialize SignedBeaconBlock - struct 126.23 us/op 120.84 us/op 1.04
BeaconState vc 300000 - deserialize tree 598.10 ms/op 593.02 ms/op 1.01
BeaconState vc 300000 - serialize tree 147.94 ms/op 148.19 ms/op 1.00
BeaconState.historicalRoots vc 300000 - deserialize tree 876.00 ns/op 821.00 ns/op 1.07
BeaconState.historicalRoots vc 300000 - serialize tree 800.00 ns/op 765.00 ns/op 1.05
BeaconState.validators vc 300000 - deserialize tree 550.23 ms/op 521.80 ms/op 1.05
BeaconState.validators vc 300000 - serialize tree 98.321 ms/op 102.19 ms/op 0.96
BeaconState.balances vc 300000 - deserialize tree 20.496 ms/op 20.686 ms/op 0.99
BeaconState.balances vc 300000 - serialize tree 4.0125 ms/op 3.9926 ms/op 1.00
BeaconState.previousEpochParticipation vc 300000 - deserialize tree 548.56 us/op 684.49 us/op 0.80
BeaconState.previousEpochParticipation vc 300000 - serialize tree 291.01 us/op 288.96 us/op 1.01
BeaconState.currentEpochParticipation vc 300000 - deserialize tree 563.17 us/op 450.13 us/op 1.25
BeaconState.currentEpochParticipation vc 300000 - serialize tree 283.88 us/op 287.17 us/op 0.99
BeaconState.inactivityScores vc 300000 - deserialize tree 21.006 ms/op 20.081 ms/op 1.05
BeaconState.inactivityScores vc 300000 - serialize tree 4.1597 ms/op 3.6692 ms/op 1.13
hashTreeRoot Attestation - struct 33.643 us/op 27.463 us/op 1.23
hashTreeRoot Attestation - tree 21.286 us/op 18.111 us/op 1.18
hashTreeRoot SignedAggregateAndProof - struct 57.859 us/op 37.426 us/op 1.55
hashTreeRoot SignedAggregateAndProof - tree 29.846 us/op 27.126 us/op 1.10
hashTreeRoot SyncCommitteeMessage - struct 10.282 us/op 8.9650 us/op 1.15
hashTreeRoot SyncCommitteeMessage - tree 6.6760 us/op 6.3710 us/op 1.05
hashTreeRoot SignedContributionAndProof - struct 26.790 us/op 24.215 us/op 1.11
hashTreeRoot SignedContributionAndProof - tree 20.062 us/op 19.253 us/op 1.04
hashTreeRoot SignedBeaconBlock - struct 2.5356 ms/op 2.1739 ms/op 1.17
hashTreeRoot SignedBeaconBlock - tree 1.7796 ms/op 1.6946 ms/op 1.05
hashTreeRoot Validator - struct 12.951 us/op 12.096 us/op 1.07
hashTreeRoot Validator - tree 11.074 us/op 10.355 us/op 1.07
BeaconState vc 300000 - hashTreeRoot tree 3.6886 s/op 3.6525 s/op 1.01
BeaconState.historicalRoots vc 300000 - hashTreeRoot tree 1.3500 us/op 1.3400 us/op 1.01
BeaconState.validators vc 300000 - hashTreeRoot tree 3.4979 s/op 3.4974 s/op 1.00
BeaconState.balances vc 300000 - hashTreeRoot tree 86.933 ms/op 86.452 ms/op 1.01
BeaconState.previousEpochParticipation vc 300000 - hashTreeRoot tree 9.0174 ms/op 9.0131 ms/op 1.00
BeaconState.currentEpochParticipation vc 300000 - hashTreeRoot tree 9.0452 ms/op 9.0085 ms/op 1.00
BeaconState.inactivityScores vc 300000 - hashTreeRoot tree 88.884 ms/op 86.569 ms/op 1.03
hash64 x18 19.557 us/op 19.358 us/op 1.01
hashTwoObjects x18 18.413 us/op 17.861 us/op 1.03
hash64 x1740 1.8220 ms/op 1.8124 ms/op 1.01
hashTwoObjects x1740 1.7030 ms/op 1.7224 ms/op 0.99
hash64 x2700000 2.8527 s/op 2.8213 s/op 1.01
hashTwoObjects x2700000 2.6502 s/op 2.6376 s/op 1.00
get_exitEpoch - ContainerType 226.00 ns/op 190.00 ns/op 1.19
get_exitEpoch - ContainerNodeStructType 231.00 ns/op 190.00 ns/op 1.22
set_exitEpoch - ContainerType 239.00 ns/op 254.00 ns/op 0.94
set_exitEpoch - ContainerNodeStructType 237.00 ns/op 204.00 ns/op 1.16
get_pubkey - ContainerType 894.00 ns/op 854.00 ns/op 1.05
get_pubkey - ContainerNodeStructType 233.00 ns/op 201.00 ns/op 1.16
hashTreeRoot - ContainerType 371.00 ns/op 337.00 ns/op 1.10
hashTreeRoot - ContainerNodeStructType 446.00 ns/op 378.00 ns/op 1.18
createProof - ContainerType 4.2990 us/op 3.7110 us/op 1.16
createProof - ContainerNodeStructType 21.894 us/op 19.853 us/op 1.10
serialize - ContainerType 1.8750 us/op 1.7860 us/op 1.05
serialize - ContainerNodeStructType 1.5420 us/op 1.5830 us/op 0.97
set_exitEpoch_and_hashTreeRoot - ContainerType 4.2740 us/op 4.1860 us/op 1.02
set_exitEpoch_and_hashTreeRoot - ContainerNodeStructType 11.401 us/op 11.102 us/op 1.03
Array - for of 5.5600 us/op 5.6380 us/op 0.99
Array - for(;;) 5.5480 us/op 5.4620 us/op 1.02
basicListValue.readonlyValuesArray() 4.3692 ms/op 4.2076 ms/op 1.04
basicListValue.readonlyValuesArray() + loop all 5.2851 ms/op 4.1542 ms/op 1.27
compositeListValue.readonlyValuesArray() 29.942 ms/op 27.561 ms/op 1.09
compositeListValue.readonlyValuesArray() + loop all 29.698 ms/op 29.214 ms/op 1.02
Number64UintType - get balances list 4.2828 ms/op 4.3291 ms/op 0.99
Number64UintType - set balances list 9.5034 ms/op 10.021 ms/op 0.95
Number64UintType - get and increase 10 then set 39.115 ms/op 40.389 ms/op 0.97
Number64UintType - increase 10 using applyDelta 15.591 ms/op 17.193 ms/op 0.91
Number64UintType - increase 10 using applyDeltaInBatch 15.269 ms/op 17.224 ms/op 0.89
tree_newTreeFromUint64Deltas 16.533 ms/op 13.377 ms/op 1.24
unsafeUint8ArrayToTree 29.468 ms/op 26.745 ms/op 1.10
bitLength(50) 216.00 ns/op 203.00 ns/op 1.06
bitLengthStr(50) 209.00 ns/op 193.00 ns/op 1.08
bitLength(8000) 201.00 ns/op 197.00 ns/op 1.02
bitLengthStr(8000) 255.00 ns/op 245.00 ns/op 1.04
bitLength(250000) 223.00 ns/op 208.00 ns/op 1.07
bitLengthStr(250000) 314.00 ns/op 297.00 ns/op 1.06
floor - Math.floor (53) 1.2371 ns/op 1.2564 ns/op 0.98
floor - << 0 (53) 1.2366 ns/op 1.2374 ns/op 1.00
floor - Math.floor (512) 1.2370 ns/op 1.2365 ns/op 1.00
floor - << 0 (512) 1.2553 ns/op 1.2364 ns/op 1.02
fnIf(0) 1.5527 ns/op 1.5548 ns/op 1.00
fnSwitch(0) 2.1715 ns/op 2.1661 ns/op 1.00
fnObj(0) 1.5467 ns/op 1.5695 ns/op 0.99
fnArr(0) 1.5472 ns/op 1.5471 ns/op 1.00
fnIf(4) 2.1654 ns/op 2.1932 ns/op 0.99
fnSwitch(4) 2.1660 ns/op 2.1642 ns/op 1.00
fnObj(4) 1.5546 ns/op 1.5485 ns/op 1.00
fnArr(4) 1.5475 ns/op 1.5481 ns/op 1.00
fnIf(9) 3.1564 ns/op 3.0949 ns/op 1.02
fnSwitch(9) 2.1665 ns/op 2.1954 ns/op 0.99
fnObj(9) 1.5461 ns/op 1.5493 ns/op 1.00
fnArr(9) 1.5531 ns/op 1.5497 ns/op 1.00
Container {a,b,vec} - as struct x100000 124.07 us/op 123.91 us/op 1.00
Container {a,b,vec} - as tree x100000 340.37 us/op 340.30 us/op 1.00
Container {a,vec,b} - as struct x100000 157.79 us/op 154.77 us/op 1.02
Container {a,vec,b} - as tree x100000 371.42 us/op 372.12 us/op 1.00
get 2 props x1000000 - rawObject 309.44 us/op 310.81 us/op 1.00
get 2 props x1000000 - proxy 73.948 ms/op 72.741 ms/op 1.02
get 2 props x1000000 - customObj 309.77 us/op 309.33 us/op 1.00
Simple object binary -> struct 861.00 ns/op 795.00 ns/op 1.08
Simple object binary -> tree_backed 1.6640 us/op 1.5580 us/op 1.07
Simple object struct -> tree_backed 2.3310 us/op 2.1900 us/op 1.06
Simple object tree_backed -> struct 2.2450 us/op 2.1540 us/op 1.04
Simple object struct -> binary 1.0160 us/op 1.0830 us/op 0.94
Simple object tree_backed -> binary 1.5700 us/op 1.5820 us/op 0.99
aggregationBits binary -> struct 627.00 ns/op 589.00 ns/op 1.06
aggregationBits binary -> tree_backed 2.4090 us/op 2.3670 us/op 1.02
aggregationBits struct -> tree_backed 2.8380 us/op 2.8010 us/op 1.01
aggregationBits tree_backed -> struct 1.2140 us/op 1.1880 us/op 1.02
aggregationBits struct -> binary 797.00 ns/op 774.00 ns/op 1.03
aggregationBits tree_backed -> binary 1.0750 us/op 1.0300 us/op 1.04
List(uint8) 100000 binary -> struct 1.3397 ms/op 1.4490 ms/op 0.92
List(uint8) 100000 binary -> tree_backed 93.770 us/op 88.515 us/op 1.06
List(uint8) 100000 struct -> tree_backed 1.1678 ms/op 1.1905 ms/op 0.98
List(uint8) 100000 tree_backed -> struct 1.0327 ms/op 1.0591 ms/op 0.98
List(uint8) 100000 struct -> binary 988.12 us/op 1.0094 ms/op 0.98
List(uint8) 100000 tree_backed -> binary 88.551 us/op 87.930 us/op 1.01
List(uint64Number) 100000 binary -> struct 1.2350 ms/op 1.2081 ms/op 1.02
List(uint64Number) 100000 binary -> tree_backed 2.8315 ms/op 3.2269 ms/op 0.88
List(uint64Number) 100000 struct -> tree_backed 3.9792 ms/op 4.8569 ms/op 0.82
List(uint64Number) 100000 tree_backed -> struct 2.0545 ms/op 2.3570 ms/op 0.87
List(uint64Number) 100000 struct -> binary 1.3642 ms/op 1.5680 ms/op 0.87
List(uint64Number) 100000 tree_backed -> binary 810.64 us/op 905.40 us/op 0.90
List(Uint64Bigint) 100000 binary -> struct 3.5439 ms/op 3.6912 ms/op 0.96
List(Uint64Bigint) 100000 binary -> tree_backed 3.2928 ms/op 3.3661 ms/op 0.98
List(Uint64Bigint) 100000 struct -> tree_backed 5.2914 ms/op 5.5335 ms/op 0.96
List(Uint64Bigint) 100000 tree_backed -> struct 4.5456 ms/op 4.6956 ms/op 0.97
List(Uint64Bigint) 100000 struct -> binary 2.0308 ms/op 2.0423 ms/op 0.99
List(Uint64Bigint) 100000 tree_backed -> binary 982.22 us/op 1.1645 ms/op 0.84
Vector(Root) 100000 binary -> struct 28.981 ms/op 31.484 ms/op 0.92
Vector(Root) 100000 binary -> tree_backed 32.772 ms/op 33.719 ms/op 0.97
Vector(Root) 100000 struct -> tree_backed 37.789 ms/op 37.528 ms/op 1.01
Vector(Root) 100000 tree_backed -> struct 44.906 ms/op 45.449 ms/op 0.99
Vector(Root) 100000 struct -> binary 2.6262 ms/op 2.5929 ms/op 1.01
Vector(Root) 100000 tree_backed -> binary 9.5413 ms/op 10.302 ms/op 0.93
List(Validator) 100000 binary -> struct 105.60 ms/op 108.18 ms/op 0.98
List(Validator) 100000 binary -> tree_backed 288.03 ms/op 290.31 ms/op 0.99
List(Validator) 100000 struct -> tree_backed 295.83 ms/op 302.03 ms/op 0.98
List(Validator) 100000 tree_backed -> struct 190.95 ms/op 192.89 ms/op 0.99
List(Validator) 100000 struct -> binary 26.600 ms/op 27.086 ms/op 0.98
List(Validator) 100000 tree_backed -> binary 101.26 ms/op 101.01 ms/op 1.00
List(Validator-NS) 100000 binary -> struct 98.635 ms/op 105.24 ms/op 0.94
List(Validator-NS) 100000 binary -> tree_backed 146.63 ms/op 144.50 ms/op 1.01
List(Validator-NS) 100000 struct -> tree_backed 173.36 ms/op 173.97 ms/op 1.00
List(Validator-NS) 100000 tree_backed -> struct 144.68 ms/op 146.22 ms/op 0.99
List(Validator-NS) 100000 struct -> binary 26.798 ms/op 27.026 ms/op 0.99
List(Validator-NS) 100000 tree_backed -> binary 33.001 ms/op 32.982 ms/op 1.00
get epochStatuses - MutableVector 90.933 us/op 104.84 us/op 0.87
get epochStatuses - ViewDU 208.96 us/op 208.53 us/op 1.00
set epochStatuses - ListTreeView 1.4093 ms/op 1.6046 ms/op 0.88
set epochStatuses - ListTreeView - set() 440.21 us/op 457.65 us/op 0.96
set epochStatuses - ListTreeView - commit() 446.39 us/op 438.80 us/op 1.02
bitstring 641.44 ns/op 645.17 ns/op 0.99
bit mask 13.464 ns/op 14.232 ns/op 0.95
struct - increase slot to 1000000 928.47 us/op 927.45 us/op 1.00
UintNumberType - increase slot to 1000000 21.668 ms/op 23.901 ms/op 0.91
UintBigintType - increase slot to 1000000 166.59 ms/op 200.68 ms/op 0.83
UintBigint8 x 100000 tree_deserialize 4.5355 ms/op 5.2920 ms/op 0.86
UintBigint8 x 100000 tree_serialize 1.0914 ms/op 1.0923 ms/op 1.00
UintBigint16 x 100000 tree_deserialize 4.5547 ms/op 6.1811 ms/op 0.74
UintBigint16 x 100000 tree_serialize 1.1746 ms/op 1.5894 ms/op 0.74
UintBigint32 x 100000 tree_deserialize 4.7314 ms/op 5.8123 ms/op 0.81
UintBigint32 x 100000 tree_serialize 1.1852 ms/op 1.4116 ms/op 0.84
UintBigint64 x 100000 tree_deserialize 4.9360 ms/op 6.5494 ms/op 0.75
UintBigint64 x 100000 tree_serialize 1.5536 ms/op 1.9879 ms/op 0.78
UintBigint8 x 100000 value_deserialize 432.91 us/op 432.99 us/op 1.00
UintBigint8 x 100000 value_serialize 623.87 us/op 708.83 us/op 0.88
UintBigint16 x 100000 value_deserialize 466.47 us/op 464.54 us/op 1.00
UintBigint16 x 100000 value_serialize 709.62 us/op 788.61 us/op 0.90
UintBigint32 x 100000 value_deserialize 433.18 us/op 433.86 us/op 1.00
UintBigint32 x 100000 value_serialize 660.54 us/op 786.64 us/op 0.84
UintBigint64 x 100000 value_deserialize 495.88 us/op 510.50 us/op 0.97
UintBigint64 x 100000 value_serialize 850.03 us/op 1.0409 ms/op 0.82
UintBigint8 x 100000 deserialize 2.8597 ms/op 3.6057 ms/op 0.79
UintBigint8 x 100000 serialize 1.4574 ms/op 1.6029 ms/op 0.91
UintBigint16 x 100000 deserialize 2.8137 ms/op 3.1933 ms/op 0.88
UintBigint16 x 100000 serialize 1.4876 ms/op 1.5637 ms/op 0.95
UintBigint32 x 100000 deserialize 2.7950 ms/op 3.2083 ms/op 0.87
UintBigint32 x 100000 serialize 2.7531 ms/op 2.9506 ms/op 0.93
UintBigint64 x 100000 deserialize 3.7903 ms/op 3.8717 ms/op 0.98
UintBigint64 x 100000 serialize 1.5308 ms/op 1.5096 ms/op 1.01
UintBigint128 x 100000 deserialize 5.4717 ms/op 5.0612 ms/op 1.08
UintBigint128 x 100000 serialize 14.511 ms/op 14.205 ms/op 1.02
UintBigint256 x 100000 deserialize 7.7624 ms/op 8.0662 ms/op 0.96
UintBigint256 x 100000 serialize 42.970 ms/op 42.049 ms/op 1.02
Slice from Uint8Array x25000 1.1213 ms/op 1.1554 ms/op 0.97
Slice from ArrayBuffer x25000 16.798 ms/op 16.639 ms/op 1.01
Slice from ArrayBuffer x25000 + new Uint8Array 18.801 ms/op 18.124 ms/op 1.04
Copy Uint8Array 100000 iterate 1.6477 ms/op 1.6601 ms/op 0.99
Copy Uint8Array 100000 slice 104.80 us/op 130.82 us/op 0.80
Copy Uint8Array 100000 Uint8Array.prototype.slice.call 110.86 us/op 137.70 us/op 0.81
Copy Buffer 100000 Uint8Array.prototype.slice.call 110.70 us/op 130.41 us/op 0.85
Copy Uint8Array 100000 slice + set 176.37 us/op 238.49 us/op 0.74
Copy Uint8Array 100000 subarray + set 112.81 us/op 127.50 us/op 0.88
Copy Uint8Array 100000 slice arrayBuffer 116.61 us/op 130.35 us/op 0.89
Uint64 deserialize 100000 - iterate Uint8Array 1.7804 ms/op 1.8916 ms/op 0.94
Uint64 deserialize 100000 - by Uint32A 1.8257 ms/op 1.9184 ms/op 0.95
Uint64 deserialize 100000 - by DataView.getUint32 x2 1.8503 ms/op 1.9187 ms/op 0.96
Uint64 deserialize 100000 - by DataView.getBigUint64 5.0285 ms/op 5.0542 ms/op 0.99
Uint64 deserialize 100000 - by byte 40.106 ms/op 40.585 ms/op 0.99

by benchmarkbot/action

github-actions[bot] avatar Apr 15 '24 08:04 github-actions[bot]

the performance of simd implementation really depends on the cpu, below is simd vs digest64

  • in CI (ubuntu), simd is just a little bit faster Screenshot 2024-04-19 at 10 17 20

  • in my environment (Mac M1) simd is ~20% faster

  digest64 vs hash4Input64s vs hash8HashObjects
    ✓ digest64 200092 times                                               6.206878 ops/s    161.1116 ms/op        -         60 runs   10.3 s
    ✓ hash 200092 times using hash4Input64s                               7.460423 ops/s    134.0406 ms/op        -         72 runs   10.2 s
    ✓ hash 200092 times using hash8HashObjects                            7.834839 ops/s    127.6350 ms/op        -         76 runs   10.2 s
  • in another ubuntu server (which is used for running a lodestar beacon node), simd is almost 2x faster
digest64 vs hash4Input64s vs hash8HashObjects
    ✓ digest64 200092 times                                               4.908615 ops/s    203.7235 ms/op        -         47 runs   10.2 s
    ✓ hash 200092 times using hash4Input64s                               9.644699 ops/s    103.6839 ms/op        -         94 runs   10.3 s
    ✓ hash 200092 times using hash8HashObjects                            9.390349 ops/s    106.4923 ms/op        -         90 runs   10.1 s

twoeths avatar Apr 19 '24 03:04 twoeths