ImTools
ImTools copied to clipboard
Try new .Net Core 2.1 PopCount() intrinsic for HAMT implementation
Currently it uses HammingWeight algorithm for calculating number of set bits in a int value: https://bitbucket.org/dadhi/dryioc/src/e788ffce78b727b35fda5355a2297d40f0e7731b/Net45/Playground/HashArrayMappedTrieTests.cs#lines-190
Here is the thing Aye, that's there System.Runtime.Intrinsics.X86.PopCount(uint/ulong)
First step will to BDN benchmark against Swar or HammingWeight with PopCount.
Have you done that? May I do it?
No, not yet. Yes, you can try!
@dzmitry-lahoda I have added two basic HAMT implementations into the Playground project with some tests.
There are two (or more) implementations: IntHashTrie<V>
and HashTrie<K, V>
(for arbitrary keys).
The method(s) to consider for replace with PopCount
are named uint GetSetBitsCount(uint n)
: https://github.com/dadhi/ImTools/blob/75b3b3b61db93653829ab9430ff9920745ea4b48/src/Playground/IntHashArrayMappedTrieTests.cs#L211
The hand-written benchmarks (without BenchmarkDotNet) are in https://github.com/dadhi/ImTools/blob/master/src/Playground/TreeBenchmarks.cs
Method | ItemCount | Mean | Error | StdDev |
--------------------- |---------- |------------:|----------:|-------------:|
SWAR_ImTools | 1000 | 2,362.5 ns | 47.07 ns | 73.284 ns |
Popcnt | 1000 | 504.7 ns | 10.05 ns | 8.908 ns |
SWAR_ImTools_Inlined | 1000 | 1,658.4 ns | 27.01 ns | 25.261 ns |
SWAR_ForkOfCoreFx | 1000 | 2,378.0 ns | 45.01 ns | 42.106 ns |
SWAR_ImTools | 10000 | 23,281.8 ns | 362.54 ns | 339.117 ns |
Popcnt | 10000 | 4,912.9 ns | 73.60 ns | 68.845 ns |
SWAR_ImTools_Inlined | 10000 | 16,221.2 ns | 265.45 ns | 248.305 ns |
SWAR_ForkOfCoreFx | 10000 | 24,810.7 ns | 492.58 ns | 1,049.733 ns |
I did in my fork as it has .NET Core only build (i have hard time work with legacy projects:()
https://github.com/dzmitry-lahoda/ImTools/blob/unnoficial/src/Playground/BitCountsBenchmarks.cs
For synthetic benchmark.
- Seem better to mark for inlining to be 100% sure you are (I guess it will be better for exact that method, so will test https://gist.github.com/mrange/d6e7415113ebfa52ccb660f4ce534dd4#gistcomment-2026775)
- You SWAR is slightly better to some than other SWAR:) on my Intel 7gen
- Popcnt is 3 to 4 times faster.
I guess need to make TreeBenchmarks to be BDN. And add copy of Trie code. Will send results.
I guess I may merge these classes into separate branch and send pull. But better if whole solution will be .NET Core for contrib.
Adding 10 items (ms):
Trie PopCnt- 0
Trie - 0
Getting one out of 10 items 1,000,000 times (ms):
Trie PopCnt - 21
Trie - 20
====================
Adding 100 items (ms):
Trie PopCnt- 2
Trie - 0
Getting one out of 100 items 1,000,000 times (ms):
Trie PopCnt - 39
Trie - 38
====================
Adding 1000 items (ms):
Trie PopCnt- 0
Trie - 0
Getting one out of 1000 items 1,000,000 times (ms):
Trie PopCnt - 40
Trie - 38
====================
Adding 10000 items (ms):
Trie PopCnt- 4
Trie - 3
Getting one out of 10000 items 1,000,000 times (ms):
Trie PopCnt - 49
Trie - 22
====================
Adding 100000 items (ms):
Trie PopCnt- 66
Trie - 80
Getting one out of 100000 items 1,000,000 times (ms):
Trie PopCnt - 64
Trie - 23
====================
Adding 1000000 items (ms):
Trie PopCnt- 1489
Trie - 1533
Getting one out of 1000000 items 1,000,000 times (ms):
Trie PopCnt - 18
Trie - 24
Strange result.
- May be transform to BDN to avoid any influences (but do not seems that issues relevant). I have intel 7700hq and process is 64 bits.
- Tried inline GetSetBitsCount. No changes.
So seem no need to popcnt for now for specific case.
I see Trie is faster than Tree always. And only slower on 1M items to add. Interesting replacement of IoC. But may be be better try out BDN to measure memory and reduce any fluctuations. E.g. right now GC.Collect() seems(not sure) starts GC, but not waits to finish.
popcnt is with netcoreapp3.0 preview. not with 2.1 or 2.2:) so I suggest to close story as it seems irrelevant for both performance and usage in near future.
Thanks for the test.
- BDN is needed - I fully agree
- For some reason, I was thinking that
PopCount
is available in the .Net Core 2.1. Hmmm...
I will keep it open until I played with it myself, will decide how to proceed later.
I was trying to get the code of intrinsic for some reason. Found that for .NET Core 2.1 there should be kind of System.Runtime.Intrinsics.Experimental.dll
. But I could not find such.