domainslib
domainslib copied to clipboard
[WIP] Add IntArray
This PR adds an IntArray
module, which supports parallel initialisation for Integer Arrays mentioned in #6. It uses @stedolan's prototype which uses strings as the base for IntArray.
Benchmarks
I have rewritten the parallel benchmarks quicksort_multicore.ml
and mergesort_multicore.ml
from sandmark to use Domainslib.IntArray
proposed in this PR along with adding parallel initialisation to them. 4.10.0+multicore
compiler variant was used to build and run the benchmarks.
Quicksort
n = 10_000_000
#Cores | IntArray | Array |
---|---|---|
1 | 6.326 | 8.352 |
2 | 5.076 | 7.136 |
4 | 3.345 | 5.576 |
8 | 2.984 | 3.879 |
12 | 5.517 | 3.044 |
16 | 2.716 | 3.046 |
20 | 4.690 | 2.419 |
24 | 9.327 | 2.327 |
Observations: The single core version of IntArray
is faster compared to Array
version and there is speedup till 8 cores, after which the IntArray
version slows down.
When the number of cores is greater than or equal to 8, there's unusually high GC activity which I suspect explains the slowdown. More specifically, the overheads seem to be at caml_stw_empty_minor_heap
and stw_handler
. @stedolan mentioned that strings are not scanned by the GC, I'm not sure if that's someway related to the increased GC activity.
This is a part of the eventlog for execution on 24 cores. Most other processes look similar to the ones seen here.
Mergesort
n = 1_000_000
#Cores | IntArray | Array |
---|---|---|
1 | 1.83 | 0.491 |
2 | 0.972 | 0.296 |
4 | 0.530 | 0.201 |
8 | 0.298 | 0.141 |
12 | 0.230 | 0.122 |
16 | 0.191 | 0.135 |
20 | 0.177 | 0.120 |
24 | 0.160 | 0.120 |
Observations: Contrary to the quicksort benchmark, there is no surge in the GC activity in mergesort, which makes me all the more uncertain about the cause of it in the quicksort benchmark. The speedup of the IntArray
version independently is quite close to what would be expected expected. But there is a huge slowdown on 1 core compared to the Array
version. The overheads lie at set
and get
of IntArray (others are present in the Array
version as well). This is a part of perf report
on 1 core.
32.25% mergesort_multi mergesort_multicore.exe [.] camlDomainslib__Intarray__get_232
26.00% mergesort_multi mergesort_multicore.exe [.] camlMergesort_multicore__sort_364
13.52% mergesort_multi mergesort_multicore.exe [.] caml_apply2
11.10% mergesort_multi mergesort_multicore.exe [.] camlDomainslib__Intarray__set_279
9.41% mergesort_multi mergesort_multicore.exe [.] camlMergesort_multicore__loop_375
4.57% mergesort_multi mergesort_multicore.exe [.] caml_apply3
1.25% mergesort_multi mergesort_multicore.exe [.] camlStdlib__random__intaux_278
Would appreciate any insights on these benchmarks.
To-Do
The module needs addition of more Array functions such as fill
, make
, map
, iter
etc. and possibly some performance tuning. I shall keep updating them to this PR.