ssz
ssz copied to clipboard
feat: SIMD implementation for as-sha256
Motivation
SIMD is available in assemblyscript, it supports v128 data structure which mean we can hash 4 inputs in parallel
Description
- New assemblyscript simd implementation in
assembly/simd.ts - New methods to support hashing 4 inputs (each 64 bytes) in parallel:
hash4Input64s(inputs: Uint8Array[]): Uint8Array[]hash8HashObjects(inputs: HashObject[])
- Add unit tests and benchmarks
Closes #356
Performance Report
✔️ no performance regression detected
Full benchmark results
| Benchmark suite | Current: 81334811864067a6af798af4f015dbcb1b99779d | Previous: cf8f04905e8d93fd44e59a319e1431f5b932d0e3 | Ratio |
|---|---|---|---|
| digestTwoHashObjects 50023 times | 47.923 ms/op | 47.926 ms/op | 1.00 |
| digest64 50023 times | 50.469 ms/op | 50.930 ms/op | 0.99 |
| digest 50023 times | 52.153 ms/op | 52.992 ms/op | 0.98 |
| input length 32 | 1.2030 us/op | 1.1920 us/op | 1.01 |
| input length 64 | 1.3590 us/op | 1.3970 us/op | 0.97 |
| input length 128 | 2.2660 us/op | 2.3880 us/op | 0.95 |
| input length 256 | 3.3830 us/op | 3.4430 us/op | 0.98 |
| input length 512 | 5.5630 us/op | 5.6190 us/op | 0.99 |
| input length 1024 | 10.707 us/op | 10.763 us/op | 0.99 |
| digest 1000000 times | 824.19 ms/op | 837.14 ms/op | 0.98 |
| hashObjectToByteArray 50023 times | 1.4283 ms/op | 1.4692 ms/op | 0.97 |
| byteArrayToHashObject 50023 times | 2.4242 ms/op | 2.4603 ms/op | 0.99 |
| digest64 200092 times | 206.57 ms/op | ||
| hash 200092 times using batchHash4UintArray64s | 212.05 ms/op | ||
| hash 200092 times using batchHash4HashObjectInputs | 212.59 ms/op | ||
| getGindicesAtDepth | 4.6080 us/op | 4.6690 us/op | 0.99 |
| iterateAtDepth | 7.2810 us/op | 7.4530 us/op | 0.98 |
| getGindexBits | 428.00 ns/op | 430.00 ns/op | 1.00 |
| gindexIterator | 1.0290 us/op | 972.00 ns/op | 1.06 |
| hash 2 Uint8Array 2250026 times - as-sha256 | 2.3156 s/op | 2.3533 s/op | 0.98 |
| hashTwoObjects 2250026 times - as-sha256 | 2.1663 s/op | 2.2222 s/op | 0.97 |
| hash 2 Uint8Array 2250026 times - noble | 5.0159 s/op | 5.2452 s/op | 0.96 |
| hashTwoObjects 2250026 times - noble | 6.8932 s/op | 6.8410 s/op | 1.01 |
| getNodeH() x7812.5 avg hindex | 12.143 us/op | 12.969 us/op | 0.94 |
| getNodeH() x7812.5 index 0 | 6.3680 us/op | 6.6040 us/op | 0.96 |
| getNodeH() x7812.5 index 7 | 6.4100 us/op | 6.5780 us/op | 0.97 |
| getNodeH() x7812.5 index 7 with key array | 6.3800 us/op | 6.4950 us/op | 0.98 |
| new LeafNode() x7812.5 | 14.760 us/op | 15.032 us/op | 0.98 |
| multiproof - depth 15, 1 requested leaves | 8.6070 us/op | 9.6410 us/op | 0.89 |
| tree offset multiproof - depth 15, 1 requested leaves | 19.633 us/op | 20.563 us/op | 0.95 |
| compact multiproof - depth 15, 1 requested leaves | 3.7230 us/op | 5.4290 us/op | 0.69 |
| multiproof - depth 15, 2 requested leaves | 11.534 us/op | 12.903 us/op | 0.89 |
| tree offset multiproof - depth 15, 2 requested leaves | 21.439 us/op | 23.655 us/op | 0.91 |
| compact multiproof - depth 15, 2 requested leaves | 3.4330 us/op | 4.4640 us/op | 0.77 |
| multiproof - depth 15, 3 requested leaves | 16.153 us/op | 18.176 us/op | 0.89 |
| tree offset multiproof - depth 15, 3 requested leaves | 27.953 us/op | 29.919 us/op | 0.93 |
| compact multiproof - depth 15, 3 requested leaves | 4.1860 us/op | 6.4790 us/op | 0.65 |
| multiproof - depth 15, 4 requested leaves | 21.466 us/op | 23.370 us/op | 0.92 |
| tree offset multiproof - depth 15, 4 requested leaves | 33.883 us/op | 36.995 us/op | 0.92 |
| compact multiproof - depth 15, 4 requested leaves | 5.0580 us/op | 5.3080 us/op | 0.95 |
| packedRootsBytesToLeafNodes bytes 4000 offset 0 | 1.9560 us/op | 1.9930 us/op | 0.98 |
| packedRootsBytesToLeafNodes bytes 4000 offset 1 | 1.9810 us/op | 2.0020 us/op | 0.99 |
| packedRootsBytesToLeafNodes bytes 4000 offset 2 | 1.9630 us/op | 2.0000 us/op | 0.98 |
| packedRootsBytesToLeafNodes bytes 4000 offset 3 | 1.8760 us/op | 1.9940 us/op | 0.94 |
| subtreeFillToContents depth 40 count 250000 | 46.530 ms/op | 45.958 ms/op | 1.01 |
| setRoot - gindexBitstring | 8.1636 ms/op | 8.4206 ms/op | 0.97 |
| setRoot - gindex | 8.5065 ms/op | 8.7619 ms/op | 0.97 |
| getRoot - gindexBitstring | 2.4350 ms/op | 2.4504 ms/op | 0.99 |
| getRoot - gindex | 3.3562 ms/op | 3.3620 ms/op | 1.00 |
| getHashObject then setHashObject | 10.247 ms/op | 10.481 ms/op | 0.98 |
| setNodeWithFn | 7.9182 ms/op | 8.0530 ms/op | 0.98 |
| getNodeAtDepth depth 0 x100000 | 1.0832 ms/op | 1.0852 ms/op | 1.00 |
| setNodeAtDepth depth 0 x100000 | 2.3466 ms/op | 2.4234 ms/op | 0.97 |
| getNodesAtDepth depth 0 x100000 | 1.0524 ms/op | 1.0538 ms/op | 1.00 |
| setNodesAtDepth depth 0 x100000 | 1.4245 ms/op | 1.4528 ms/op | 0.98 |
| getNodeAtDepth depth 1 x100000 | 1.1464 ms/op | 1.1686 ms/op | 0.98 |
| setNodeAtDepth depth 1 x100000 | 5.1183 ms/op | 5.1398 ms/op | 1.00 |
| getNodesAtDepth depth 1 x100000 | 1.1763 ms/op | 1.1909 ms/op | 0.99 |
| setNodesAtDepth depth 1 x100000 | 4.3033 ms/op | 4.3132 ms/op | 1.00 |
| getNodeAtDepth depth 2 x100000 | 1.4276 ms/op | 1.4221 ms/op | 1.00 |
| setNodeAtDepth depth 2 x100000 | 8.7806 ms/op | 10.417 ms/op | 0.84 |
| getNodesAtDepth depth 2 x100000 | 16.869 ms/op | 18.389 ms/op | 0.92 |
| setNodesAtDepth depth 2 x100000 | 12.381 ms/op | 12.926 ms/op | 0.96 |
| tree.getNodesAtDepth - gindexes | 7.7827 ms/op | 8.0320 ms/op | 0.97 |
| tree.getNodesAtDepth - push all nodes | 1.9585 ms/op | 1.9345 ms/op | 1.01 |
| tree.getNodesAtDepth - navigation | 233.92 us/op | 235.57 us/op | 0.99 |
| tree.setNodesAtDepth - indexes | 349.98 us/op | 308.89 us/op | 1.13 |
| set at depth 8 | 443.00 ns/op | 450.00 ns/op | 0.98 |
| set at depth 16 | 588.00 ns/op | 596.00 ns/op | 0.99 |
| set at depth 32 | 951.00 ns/op | 958.00 ns/op | 0.99 |
| iterateNodesAtDepth 8 256 | 13.080 us/op | 13.212 us/op | 0.99 |
| getNodesAtDepth 8 256 | 3.4390 us/op | 3.3790 us/op | 1.02 |
| iterateNodesAtDepth 16 65536 | 4.2388 ms/op | 4.3308 ms/op | 0.98 |
| getNodesAtDepth 16 65536 | 1.5835 ms/op | 1.6273 ms/op | 0.97 |
| iterateNodesAtDepth 32 250000 | 15.410 ms/op | 15.634 ms/op | 0.99 |
| getNodesAtDepth 32 250000 | 4.3000 ms/op | 4.3522 ms/op | 0.99 |
| iterateNodesAtDepth 40 250000 | 15.540 ms/op | 15.708 ms/op | 0.99 |
| getNodesAtDepth 40 250000 | 4.3836 ms/op | 4.4330 ms/op | 0.99 |
| 250k validators | 7.1398 s/op | 7.1114 s/op | 1.00 |
| bitlist bytes to struct (120,90) | 482.00 ns/op | 484.00 ns/op | 1.00 |
| bitlist bytes to tree (120,90) | 2.1360 us/op | 2.1460 us/op | 1.00 |
| bitlist bytes to struct (2048,2048) | 911.00 ns/op | 922.00 ns/op | 0.99 |
| bitlist bytes to tree (2048,2048) | 3.3240 us/op | 3.3630 us/op | 0.99 |
| ByteListType - deserialize | 7.8165 ms/op | 7.3046 ms/op | 1.07 |
| BasicListType |
11.857 ms/op | 11.915 ms/op | 1.00 |
| ByteListType - serialize | 7.8777 ms/op | 7.9004 ms/op | 1.00 |
| BasicListType |
9.6364 ms/op | 10.023 ms/op | 0.96 |
| BasicListType |
22.355 ms/op | 22.655 ms/op | 0.99 |
| List[uint8, 68719476736] len 300000 ViewDU.getAll() + iterate | 4.3003 ms/op | 4.4147 ms/op | 0.97 |
| List[uint8, 68719476736] len 300000 ViewDU.get(i) | 4.1212 ms/op | 2.9512 ms/op | 1.40 |
| Array.push len 300000 empty Array - number | 6.3746 ms/op | 6.2896 ms/op | 1.01 |
| Array.set len 300000 from new Array - number | 1.6630 ms/op | 1.7071 ms/op | 0.97 |
| Array.set len 300000 - number | 5.2218 ms/op | 5.2257 ms/op | 1.00 |
| Uint8Array.set len 300000 | 373.14 us/op | 372.38 us/op | 1.00 |
| Uint32Array.set len 300000 | 443.43 us/op | 445.15 us/op | 1.00 |
| Container({a: uint8, b: uint8}) getViewDU x300000 | 52.403 ms/op | 49.804 ms/op | 1.05 |
| ContainerNodeStruct({a: uint8, b: uint8}) getViewDU x300000 | 10.700 ms/op | 10.834 ms/op | 0.99 |
| List(Container) len 300000 ViewDU.getAllReadonly() + iterate | 208.75 ms/op | 209.73 ms/op | 1.00 |
| List(Container) len 300000 ViewDU.getAllReadonlyValues() + iterate | 316.36 ms/op | 273.31 ms/op | 1.16 |
| List(Container) len 300000 ViewDU.get(i) | 8.7640 ms/op | 6.3717 ms/op | 1.38 |
| List(Container) len 300000 ViewDU.getReadonly(i) | 8.1774 ms/op | 6.3376 ms/op | 1.29 |
| List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonly() + iterate | 40.470 ms/op | 41.496 ms/op | 0.98 |
| List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonlyValues() + iterate | 5.6273 ms/op | 5.1590 ms/op | 1.09 |
| List(ContainerNodeStruct) len 300000 ViewDU.get(i) | 7.2073 ms/op | 5.9948 ms/op | 1.20 |
| List(ContainerNodeStruct) len 300000 ViewDU.getReadonly(i) | 7.1238 ms/op | 5.9572 ms/op | 1.20 |
| Array.push len 300000 empty Array - object | 6.8128 ms/op | 5.9218 ms/op | 1.15 |
| Array.set len 300000 from new Array - object | 2.2630 ms/op | 1.9831 ms/op | 1.14 |
| Array.set len 300000 - object | 6.7586 ms/op | 5.7016 ms/op | 1.19 |
| cachePermanentRootStruct no cache | 9.2840 us/op | 8.5850 us/op | 1.08 |
| cachePermanentRootStruct with cache | 237.00 ns/op | 188.00 ns/op | 1.26 |
| epochParticipation len 250000 rws 7813 | 2.3041 ms/op | 1.8994 ms/op | 1.21 |
| deserialize Attestation - tree | 4.5990 us/op | 4.0490 us/op | 1.14 |
| deserialize Attestation - struct | 2.0270 us/op | 1.7750 us/op | 1.14 |
| deserialize SignedAggregateAndProof - tree | 3.7370 us/op | 3.6180 us/op | 1.03 |
| deserialize SignedAggregateAndProof - struct | 3.1580 us/op | 2.9150 us/op | 1.08 |
| deserialize SyncCommitteeMessage - tree | 1.0770 us/op | 1.0360 us/op | 1.04 |
| deserialize SyncCommitteeMessage - struct | 1.1750 us/op | 980.00 ns/op | 1.20 |
| deserialize SignedContributionAndProof - tree | 2.1180 us/op | 1.9690 us/op | 1.08 |
| deserialize SignedContributionAndProof - struct | 2.5370 us/op | 2.3590 us/op | 1.08 |
| deserialize SignedBeaconBlock - tree | 238.34 us/op | 208.32 us/op | 1.14 |
| deserialize SignedBeaconBlock - struct | 126.23 us/op | 120.84 us/op | 1.04 |
| BeaconState vc 300000 - deserialize tree | 598.10 ms/op | 593.02 ms/op | 1.01 |
| BeaconState vc 300000 - serialize tree | 147.94 ms/op | 148.19 ms/op | 1.00 |
| BeaconState.historicalRoots vc 300000 - deserialize tree | 876.00 ns/op | 821.00 ns/op | 1.07 |
| BeaconState.historicalRoots vc 300000 - serialize tree | 800.00 ns/op | 765.00 ns/op | 1.05 |
| BeaconState.validators vc 300000 - deserialize tree | 550.23 ms/op | 521.80 ms/op | 1.05 |
| BeaconState.validators vc 300000 - serialize tree | 98.321 ms/op | 102.19 ms/op | 0.96 |
| BeaconState.balances vc 300000 - deserialize tree | 20.496 ms/op | 20.686 ms/op | 0.99 |
| BeaconState.balances vc 300000 - serialize tree | 4.0125 ms/op | 3.9926 ms/op | 1.00 |
| BeaconState.previousEpochParticipation vc 300000 - deserialize tree | 548.56 us/op | 684.49 us/op | 0.80 |
| BeaconState.previousEpochParticipation vc 300000 - serialize tree | 291.01 us/op | 288.96 us/op | 1.01 |
| BeaconState.currentEpochParticipation vc 300000 - deserialize tree | 563.17 us/op | 450.13 us/op | 1.25 |
| BeaconState.currentEpochParticipation vc 300000 - serialize tree | 283.88 us/op | 287.17 us/op | 0.99 |
| BeaconState.inactivityScores vc 300000 - deserialize tree | 21.006 ms/op | 20.081 ms/op | 1.05 |
| BeaconState.inactivityScores vc 300000 - serialize tree | 4.1597 ms/op | 3.6692 ms/op | 1.13 |
| hashTreeRoot Attestation - struct | 33.643 us/op | 27.463 us/op | 1.23 |
| hashTreeRoot Attestation - tree | 21.286 us/op | 18.111 us/op | 1.18 |
| hashTreeRoot SignedAggregateAndProof - struct | 57.859 us/op | 37.426 us/op | 1.55 |
| hashTreeRoot SignedAggregateAndProof - tree | 29.846 us/op | 27.126 us/op | 1.10 |
| hashTreeRoot SyncCommitteeMessage - struct | 10.282 us/op | 8.9650 us/op | 1.15 |
| hashTreeRoot SyncCommitteeMessage - tree | 6.6760 us/op | 6.3710 us/op | 1.05 |
| hashTreeRoot SignedContributionAndProof - struct | 26.790 us/op | 24.215 us/op | 1.11 |
| hashTreeRoot SignedContributionAndProof - tree | 20.062 us/op | 19.253 us/op | 1.04 |
| hashTreeRoot SignedBeaconBlock - struct | 2.5356 ms/op | 2.1739 ms/op | 1.17 |
| hashTreeRoot SignedBeaconBlock - tree | 1.7796 ms/op | 1.6946 ms/op | 1.05 |
| hashTreeRoot Validator - struct | 12.951 us/op | 12.096 us/op | 1.07 |
| hashTreeRoot Validator - tree | 11.074 us/op | 10.355 us/op | 1.07 |
| BeaconState vc 300000 - hashTreeRoot tree | 3.6886 s/op | 3.6525 s/op | 1.01 |
| BeaconState.historicalRoots vc 300000 - hashTreeRoot tree | 1.3500 us/op | 1.3400 us/op | 1.01 |
| BeaconState.validators vc 300000 - hashTreeRoot tree | 3.4979 s/op | 3.4974 s/op | 1.00 |
| BeaconState.balances vc 300000 - hashTreeRoot tree | 86.933 ms/op | 86.452 ms/op | 1.01 |
| BeaconState.previousEpochParticipation vc 300000 - hashTreeRoot tree | 9.0174 ms/op | 9.0131 ms/op | 1.00 |
| BeaconState.currentEpochParticipation vc 300000 - hashTreeRoot tree | 9.0452 ms/op | 9.0085 ms/op | 1.00 |
| BeaconState.inactivityScores vc 300000 - hashTreeRoot tree | 88.884 ms/op | 86.569 ms/op | 1.03 |
| hash64 x18 | 19.557 us/op | 19.358 us/op | 1.01 |
| hashTwoObjects x18 | 18.413 us/op | 17.861 us/op | 1.03 |
| hash64 x1740 | 1.8220 ms/op | 1.8124 ms/op | 1.01 |
| hashTwoObjects x1740 | 1.7030 ms/op | 1.7224 ms/op | 0.99 |
| hash64 x2700000 | 2.8527 s/op | 2.8213 s/op | 1.01 |
| hashTwoObjects x2700000 | 2.6502 s/op | 2.6376 s/op | 1.00 |
| get_exitEpoch - ContainerType | 226.00 ns/op | 190.00 ns/op | 1.19 |
| get_exitEpoch - ContainerNodeStructType | 231.00 ns/op | 190.00 ns/op | 1.22 |
| set_exitEpoch - ContainerType | 239.00 ns/op | 254.00 ns/op | 0.94 |
| set_exitEpoch - ContainerNodeStructType | 237.00 ns/op | 204.00 ns/op | 1.16 |
| get_pubkey - ContainerType | 894.00 ns/op | 854.00 ns/op | 1.05 |
| get_pubkey - ContainerNodeStructType | 233.00 ns/op | 201.00 ns/op | 1.16 |
| hashTreeRoot - ContainerType | 371.00 ns/op | 337.00 ns/op | 1.10 |
| hashTreeRoot - ContainerNodeStructType | 446.00 ns/op | 378.00 ns/op | 1.18 |
| createProof - ContainerType | 4.2990 us/op | 3.7110 us/op | 1.16 |
| createProof - ContainerNodeStructType | 21.894 us/op | 19.853 us/op | 1.10 |
| serialize - ContainerType | 1.8750 us/op | 1.7860 us/op | 1.05 |
| serialize - ContainerNodeStructType | 1.5420 us/op | 1.5830 us/op | 0.97 |
| set_exitEpoch_and_hashTreeRoot - ContainerType | 4.2740 us/op | 4.1860 us/op | 1.02 |
| set_exitEpoch_and_hashTreeRoot - ContainerNodeStructType | 11.401 us/op | 11.102 us/op | 1.03 |
| Array - for of | 5.5600 us/op | 5.6380 us/op | 0.99 |
| Array - for(;;) | 5.5480 us/op | 5.4620 us/op | 1.02 |
| basicListValue.readonlyValuesArray() | 4.3692 ms/op | 4.2076 ms/op | 1.04 |
| basicListValue.readonlyValuesArray() + loop all | 5.2851 ms/op | 4.1542 ms/op | 1.27 |
| compositeListValue.readonlyValuesArray() | 29.942 ms/op | 27.561 ms/op | 1.09 |
| compositeListValue.readonlyValuesArray() + loop all | 29.698 ms/op | 29.214 ms/op | 1.02 |
| Number64UintType - get balances list | 4.2828 ms/op | 4.3291 ms/op | 0.99 |
| Number64UintType - set balances list | 9.5034 ms/op | 10.021 ms/op | 0.95 |
| Number64UintType - get and increase 10 then set | 39.115 ms/op | 40.389 ms/op | 0.97 |
| Number64UintType - increase 10 using applyDelta | 15.591 ms/op | 17.193 ms/op | 0.91 |
| Number64UintType - increase 10 using applyDeltaInBatch | 15.269 ms/op | 17.224 ms/op | 0.89 |
| tree_newTreeFromUint64Deltas | 16.533 ms/op | 13.377 ms/op | 1.24 |
| unsafeUint8ArrayToTree | 29.468 ms/op | 26.745 ms/op | 1.10 |
| bitLength(50) | 216.00 ns/op | 203.00 ns/op | 1.06 |
| bitLengthStr(50) | 209.00 ns/op | 193.00 ns/op | 1.08 |
| bitLength(8000) | 201.00 ns/op | 197.00 ns/op | 1.02 |
| bitLengthStr(8000) | 255.00 ns/op | 245.00 ns/op | 1.04 |
| bitLength(250000) | 223.00 ns/op | 208.00 ns/op | 1.07 |
| bitLengthStr(250000) | 314.00 ns/op | 297.00 ns/op | 1.06 |
| floor - Math.floor (53) | 1.2371 ns/op | 1.2564 ns/op | 0.98 |
| floor - << 0 (53) | 1.2366 ns/op | 1.2374 ns/op | 1.00 |
| floor - Math.floor (512) | 1.2370 ns/op | 1.2365 ns/op | 1.00 |
| floor - << 0 (512) | 1.2553 ns/op | 1.2364 ns/op | 1.02 |
| fnIf(0) | 1.5527 ns/op | 1.5548 ns/op | 1.00 |
| fnSwitch(0) | 2.1715 ns/op | 2.1661 ns/op | 1.00 |
| fnObj(0) | 1.5467 ns/op | 1.5695 ns/op | 0.99 |
| fnArr(0) | 1.5472 ns/op | 1.5471 ns/op | 1.00 |
| fnIf(4) | 2.1654 ns/op | 2.1932 ns/op | 0.99 |
| fnSwitch(4) | 2.1660 ns/op | 2.1642 ns/op | 1.00 |
| fnObj(4) | 1.5546 ns/op | 1.5485 ns/op | 1.00 |
| fnArr(4) | 1.5475 ns/op | 1.5481 ns/op | 1.00 |
| fnIf(9) | 3.1564 ns/op | 3.0949 ns/op | 1.02 |
| fnSwitch(9) | 2.1665 ns/op | 2.1954 ns/op | 0.99 |
| fnObj(9) | 1.5461 ns/op | 1.5493 ns/op | 1.00 |
| fnArr(9) | 1.5531 ns/op | 1.5497 ns/op | 1.00 |
| Container {a,b,vec} - as struct x100000 | 124.07 us/op | 123.91 us/op | 1.00 |
| Container {a,b,vec} - as tree x100000 | 340.37 us/op | 340.30 us/op | 1.00 |
| Container {a,vec,b} - as struct x100000 | 157.79 us/op | 154.77 us/op | 1.02 |
| Container {a,vec,b} - as tree x100000 | 371.42 us/op | 372.12 us/op | 1.00 |
| get 2 props x1000000 - rawObject | 309.44 us/op | 310.81 us/op | 1.00 |
| get 2 props x1000000 - proxy | 73.948 ms/op | 72.741 ms/op | 1.02 |
| get 2 props x1000000 - customObj | 309.77 us/op | 309.33 us/op | 1.00 |
| Simple object binary -> struct | 861.00 ns/op | 795.00 ns/op | 1.08 |
| Simple object binary -> tree_backed | 1.6640 us/op | 1.5580 us/op | 1.07 |
| Simple object struct -> tree_backed | 2.3310 us/op | 2.1900 us/op | 1.06 |
| Simple object tree_backed -> struct | 2.2450 us/op | 2.1540 us/op | 1.04 |
| Simple object struct -> binary | 1.0160 us/op | 1.0830 us/op | 0.94 |
| Simple object tree_backed -> binary | 1.5700 us/op | 1.5820 us/op | 0.99 |
| aggregationBits binary -> struct | 627.00 ns/op | 589.00 ns/op | 1.06 |
| aggregationBits binary -> tree_backed | 2.4090 us/op | 2.3670 us/op | 1.02 |
| aggregationBits struct -> tree_backed | 2.8380 us/op | 2.8010 us/op | 1.01 |
| aggregationBits tree_backed -> struct | 1.2140 us/op | 1.1880 us/op | 1.02 |
| aggregationBits struct -> binary | 797.00 ns/op | 774.00 ns/op | 1.03 |
| aggregationBits tree_backed -> binary | 1.0750 us/op | 1.0300 us/op | 1.04 |
| List(uint8) 100000 binary -> struct | 1.3397 ms/op | 1.4490 ms/op | 0.92 |
| List(uint8) 100000 binary -> tree_backed | 93.770 us/op | 88.515 us/op | 1.06 |
| List(uint8) 100000 struct -> tree_backed | 1.1678 ms/op | 1.1905 ms/op | 0.98 |
| List(uint8) 100000 tree_backed -> struct | 1.0327 ms/op | 1.0591 ms/op | 0.98 |
| List(uint8) 100000 struct -> binary | 988.12 us/op | 1.0094 ms/op | 0.98 |
| List(uint8) 100000 tree_backed -> binary | 88.551 us/op | 87.930 us/op | 1.01 |
| List(uint64Number) 100000 binary -> struct | 1.2350 ms/op | 1.2081 ms/op | 1.02 |
| List(uint64Number) 100000 binary -> tree_backed | 2.8315 ms/op | 3.2269 ms/op | 0.88 |
| List(uint64Number) 100000 struct -> tree_backed | 3.9792 ms/op | 4.8569 ms/op | 0.82 |
| List(uint64Number) 100000 tree_backed -> struct | 2.0545 ms/op | 2.3570 ms/op | 0.87 |
| List(uint64Number) 100000 struct -> binary | 1.3642 ms/op | 1.5680 ms/op | 0.87 |
| List(uint64Number) 100000 tree_backed -> binary | 810.64 us/op | 905.40 us/op | 0.90 |
| List(Uint64Bigint) 100000 binary -> struct | 3.5439 ms/op | 3.6912 ms/op | 0.96 |
| List(Uint64Bigint) 100000 binary -> tree_backed | 3.2928 ms/op | 3.3661 ms/op | 0.98 |
| List(Uint64Bigint) 100000 struct -> tree_backed | 5.2914 ms/op | 5.5335 ms/op | 0.96 |
| List(Uint64Bigint) 100000 tree_backed -> struct | 4.5456 ms/op | 4.6956 ms/op | 0.97 |
| List(Uint64Bigint) 100000 struct -> binary | 2.0308 ms/op | 2.0423 ms/op | 0.99 |
| List(Uint64Bigint) 100000 tree_backed -> binary | 982.22 us/op | 1.1645 ms/op | 0.84 |
| Vector(Root) 100000 binary -> struct | 28.981 ms/op | 31.484 ms/op | 0.92 |
| Vector(Root) 100000 binary -> tree_backed | 32.772 ms/op | 33.719 ms/op | 0.97 |
| Vector(Root) 100000 struct -> tree_backed | 37.789 ms/op | 37.528 ms/op | 1.01 |
| Vector(Root) 100000 tree_backed -> struct | 44.906 ms/op | 45.449 ms/op | 0.99 |
| Vector(Root) 100000 struct -> binary | 2.6262 ms/op | 2.5929 ms/op | 1.01 |
| Vector(Root) 100000 tree_backed -> binary | 9.5413 ms/op | 10.302 ms/op | 0.93 |
| List(Validator) 100000 binary -> struct | 105.60 ms/op | 108.18 ms/op | 0.98 |
| List(Validator) 100000 binary -> tree_backed | 288.03 ms/op | 290.31 ms/op | 0.99 |
| List(Validator) 100000 struct -> tree_backed | 295.83 ms/op | 302.03 ms/op | 0.98 |
| List(Validator) 100000 tree_backed -> struct | 190.95 ms/op | 192.89 ms/op | 0.99 |
| List(Validator) 100000 struct -> binary | 26.600 ms/op | 27.086 ms/op | 0.98 |
| List(Validator) 100000 tree_backed -> binary | 101.26 ms/op | 101.01 ms/op | 1.00 |
| List(Validator-NS) 100000 binary -> struct | 98.635 ms/op | 105.24 ms/op | 0.94 |
| List(Validator-NS) 100000 binary -> tree_backed | 146.63 ms/op | 144.50 ms/op | 1.01 |
| List(Validator-NS) 100000 struct -> tree_backed | 173.36 ms/op | 173.97 ms/op | 1.00 |
| List(Validator-NS) 100000 tree_backed -> struct | 144.68 ms/op | 146.22 ms/op | 0.99 |
| List(Validator-NS) 100000 struct -> binary | 26.798 ms/op | 27.026 ms/op | 0.99 |
| List(Validator-NS) 100000 tree_backed -> binary | 33.001 ms/op | 32.982 ms/op | 1.00 |
| get epochStatuses - MutableVector | 90.933 us/op | 104.84 us/op | 0.87 |
| get epochStatuses - ViewDU | 208.96 us/op | 208.53 us/op | 1.00 |
| set epochStatuses - ListTreeView | 1.4093 ms/op | 1.6046 ms/op | 0.88 |
| set epochStatuses - ListTreeView - set() | 440.21 us/op | 457.65 us/op | 0.96 |
| set epochStatuses - ListTreeView - commit() | 446.39 us/op | 438.80 us/op | 1.02 |
| bitstring | 641.44 ns/op | 645.17 ns/op | 0.99 |
| bit mask | 13.464 ns/op | 14.232 ns/op | 0.95 |
| struct - increase slot to 1000000 | 928.47 us/op | 927.45 us/op | 1.00 |
| UintNumberType - increase slot to 1000000 | 21.668 ms/op | 23.901 ms/op | 0.91 |
| UintBigintType - increase slot to 1000000 | 166.59 ms/op | 200.68 ms/op | 0.83 |
| UintBigint8 x 100000 tree_deserialize | 4.5355 ms/op | 5.2920 ms/op | 0.86 |
| UintBigint8 x 100000 tree_serialize | 1.0914 ms/op | 1.0923 ms/op | 1.00 |
| UintBigint16 x 100000 tree_deserialize | 4.5547 ms/op | 6.1811 ms/op | 0.74 |
| UintBigint16 x 100000 tree_serialize | 1.1746 ms/op | 1.5894 ms/op | 0.74 |
| UintBigint32 x 100000 tree_deserialize | 4.7314 ms/op | 5.8123 ms/op | 0.81 |
| UintBigint32 x 100000 tree_serialize | 1.1852 ms/op | 1.4116 ms/op | 0.84 |
| UintBigint64 x 100000 tree_deserialize | 4.9360 ms/op | 6.5494 ms/op | 0.75 |
| UintBigint64 x 100000 tree_serialize | 1.5536 ms/op | 1.9879 ms/op | 0.78 |
| UintBigint8 x 100000 value_deserialize | 432.91 us/op | 432.99 us/op | 1.00 |
| UintBigint8 x 100000 value_serialize | 623.87 us/op | 708.83 us/op | 0.88 |
| UintBigint16 x 100000 value_deserialize | 466.47 us/op | 464.54 us/op | 1.00 |
| UintBigint16 x 100000 value_serialize | 709.62 us/op | 788.61 us/op | 0.90 |
| UintBigint32 x 100000 value_deserialize | 433.18 us/op | 433.86 us/op | 1.00 |
| UintBigint32 x 100000 value_serialize | 660.54 us/op | 786.64 us/op | 0.84 |
| UintBigint64 x 100000 value_deserialize | 495.88 us/op | 510.50 us/op | 0.97 |
| UintBigint64 x 100000 value_serialize | 850.03 us/op | 1.0409 ms/op | 0.82 |
| UintBigint8 x 100000 deserialize | 2.8597 ms/op | 3.6057 ms/op | 0.79 |
| UintBigint8 x 100000 serialize | 1.4574 ms/op | 1.6029 ms/op | 0.91 |
| UintBigint16 x 100000 deserialize | 2.8137 ms/op | 3.1933 ms/op | 0.88 |
| UintBigint16 x 100000 serialize | 1.4876 ms/op | 1.5637 ms/op | 0.95 |
| UintBigint32 x 100000 deserialize | 2.7950 ms/op | 3.2083 ms/op | 0.87 |
| UintBigint32 x 100000 serialize | 2.7531 ms/op | 2.9506 ms/op | 0.93 |
| UintBigint64 x 100000 deserialize | 3.7903 ms/op | 3.8717 ms/op | 0.98 |
| UintBigint64 x 100000 serialize | 1.5308 ms/op | 1.5096 ms/op | 1.01 |
| UintBigint128 x 100000 deserialize | 5.4717 ms/op | 5.0612 ms/op | 1.08 |
| UintBigint128 x 100000 serialize | 14.511 ms/op | 14.205 ms/op | 1.02 |
| UintBigint256 x 100000 deserialize | 7.7624 ms/op | 8.0662 ms/op | 0.96 |
| UintBigint256 x 100000 serialize | 42.970 ms/op | 42.049 ms/op | 1.02 |
| Slice from Uint8Array x25000 | 1.1213 ms/op | 1.1554 ms/op | 0.97 |
| Slice from ArrayBuffer x25000 | 16.798 ms/op | 16.639 ms/op | 1.01 |
| Slice from ArrayBuffer x25000 + new Uint8Array | 18.801 ms/op | 18.124 ms/op | 1.04 |
| Copy Uint8Array 100000 iterate | 1.6477 ms/op | 1.6601 ms/op | 0.99 |
| Copy Uint8Array 100000 slice | 104.80 us/op | 130.82 us/op | 0.80 |
| Copy Uint8Array 100000 Uint8Array.prototype.slice.call | 110.86 us/op | 137.70 us/op | 0.81 |
| Copy Buffer 100000 Uint8Array.prototype.slice.call | 110.70 us/op | 130.41 us/op | 0.85 |
| Copy Uint8Array 100000 slice + set | 176.37 us/op | 238.49 us/op | 0.74 |
| Copy Uint8Array 100000 subarray + set | 112.81 us/op | 127.50 us/op | 0.88 |
| Copy Uint8Array 100000 slice arrayBuffer | 116.61 us/op | 130.35 us/op | 0.89 |
| Uint64 deserialize 100000 - iterate Uint8Array | 1.7804 ms/op | 1.8916 ms/op | 0.94 |
| Uint64 deserialize 100000 - by Uint32A | 1.8257 ms/op | 1.9184 ms/op | 0.95 |
| Uint64 deserialize 100000 - by DataView.getUint32 x2 | 1.8503 ms/op | 1.9187 ms/op | 0.96 |
| Uint64 deserialize 100000 - by DataView.getBigUint64 | 5.0285 ms/op | 5.0542 ms/op | 0.99 |
| Uint64 deserialize 100000 - by byte | 40.106 ms/op | 40.585 ms/op | 0.99 |
by benchmarkbot/action
the performance of simd implementation really depends on the cpu, below is simd vs digest64
-
in CI (ubuntu), simd is just a little bit faster
-
in my environment (Mac M1) simd is ~20% faster
digest64 vs hash4Input64s vs hash8HashObjects
✓ digest64 200092 times 6.206878 ops/s 161.1116 ms/op - 60 runs 10.3 s
✓ hash 200092 times using hash4Input64s 7.460423 ops/s 134.0406 ms/op - 72 runs 10.2 s
✓ hash 200092 times using hash8HashObjects 7.834839 ops/s 127.6350 ms/op - 76 runs 10.2 s
- in another ubuntu server (which is used for running a lodestar beacon node), simd is almost 2x faster
digest64 vs hash4Input64s vs hash8HashObjects
✓ digest64 200092 times 4.908615 ops/s 203.7235 ms/op - 47 runs 10.2 s
✓ hash 200092 times using hash4Input64s 9.644699 ops/s 103.6839 ms/op - 94 runs 10.3 s
✓ hash 200092 times using hash8HashObjects 9.390349 ops/s 106.4923 ms/op - 90 runs 10.1 s