Test https://github.com/dfinity/motoko/pull/3955

May 03 '23 00:05 chenyan-dfinity

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	196_227	2_387_017_574	9_102_052	1_293_015	689_295_883	1_224_704
triemap	201_786	2_286_509_386	9_715_900	891_925	2_111_535	1_187_671
rbtree	198_958	2_024_735_614	8_902_160	787_768	1_839_147	991_630
splay	197_868	2_305_505_782	8_702_096	1_258_328	2_175_053	1_259_845
btree	235_291 ($\textcolor{green}{-0.02\%}$)	2_116_952_097 ($\textcolor{green}{-0.25\%}$)	7_556_172	936_630	1_948_829 ($\textcolor{green}{-0.31\%}$)	991_671
zhenya_hashmap	190_491	1_855_281_618	9_301_800	745_902	1_651_310	752_198
btreemap_rs	514_775	115_994_744	1_638_400	59_433	137_855	61_795
hashmap_rs	502_772	53_333_947	1_835_008	21_070	63_601	22_484

Priority queue

Note Same as main branch, skipping.

MoVM

Note Same as main branch, skipping.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	291_432 ($\textcolor{green}{-0.02\%}$)	44_505 ($\textcolor{red}{0.13\%}$)	20_039 ($\textcolor{red}{0.58\%}$)	14_077 ($\textcolor{green}{-0.01\%}$)	16_739 ($\textcolor{red}{0.35\%}$)
Rust	940_461	541_441	102_463	125_485	137_030

DIP721 NFT

Note Same as main branch, skipping.

Heartbeat

	binary_size	heartbeat
Motoko	156_821	5_324 ($\textcolor{green}{-40.70\%}$)
Rust	35_608	1_127 ($\textcolor{red}{91.99\%}$)

Timer

Note Same as main branch, skipping.

May 03 '23 00:05 github-actions[bot]

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

btree comes from Byron Becker's stable BTreeMap library.

zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	196_227	2_387_017_574	9_102_052	1_293_015	689_295_883	1_224_704
triemap	201_786	2_286_509_386	9_715_900	891_925	2_111_535	1_187_671
rbtree	198_958	2_024_735_614	8_902_160	787_768	1_839_147	991_630
splay	197_868	2_305_505_782	8_702_096	1_258_328	2_175_053	1_259_845
btree	235_291	2_116_952_097	7_556_172	936_630	1_948_829	991_671
zhenya_hashmap	190_491	1_855_281_618	9_301_800	745_902	1_651_310	752_198
btreemap_rs	514_775	115_994_744	1_638_400	59_433	137_855	61_795
hashmap_rs	502_772	53_333_947	1_835_008	21_070	63_601	22_484

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	181_726	793_253_862	1_400_024	385_321	822_756
heap_rs	473_458	5_041_433	819_200	53_243	22_092

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	196_227	477_464_161	1_820_844	1_291_042	138_897_096	1_222_118
hashmap_rs	502_772	10_984_247	950_272	20_385	62_907	21_374
imrc_hashmap_rs	513_980	19_919_391	1_572_864	31_519	120_207	37_618
movm_rs	2_092_441	1_017_324_475	2_654_208	2_494_635	6_477_172	5_106_080
movm_dynamic_rs	2_295_206	496_274_407	2_129_920	1_951_981	2_709_572	1_950_006

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	291_432	44_505	20_039	14_077	16_739
Rust	940_461	541_441	102_463	125_485	137_030

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	244_622	13_379	24_678	5_358
Rust	1_005_637	144_162	375_896	94_757

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	156_821	5_324
Rust	35_608	1_127

Timer

	binary_size	setTimer	cancelTimer
Motoko	164_347	19_476	1_907
Rust	524_361	55_152	10_417

May 03 '23 00:05 github-actions[bot]

@chenyan-dfinity

against the published result from main branch

it is unclear if this refers to this repo's main branch or dfinity/motoko's master branch. If it is the former, I would have expected the second run (baseline 0.8.8) to show less improvement.

May 03 '23 11:05 ggreif

I just rerun the CI, it looks good now. It's against this repo's main branch. The main branch needs some time to update the report after the PR is merged. You probably trigger the CI too soon.

May 03 '23 17:05 chenyan-dfinity

[DO NOT MERGE] perf from moc artifact

Map

Priority queue

MoVM

Basic DAO

DIP721 NFT

Heartbeat

Timer

Collection libraries

💎 Takeaways

Map

Priority queue

MoVM

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer