multicoretests icon indicating copy to clipboard operation
multicoretests copied to clipboard

Reuse domains rather than spawning them for each parallel test

Open OlivierNicole opened this issue 3 months ago • 9 comments

Given that domains are costly to spawn and join, and parallel tests in multicoretests do it many thousands of times, I wanted to look at it more closely. A quick perf profile on src/array/lin_tests.ml shows that more than 50 % of the time is spent in Domain.spawn. I wanted to try spawning domains only at the start of the test sequence and reusing them. Then I realised that this is exactly what Domainslib provides with domain pools!

This proof of concept PR simply replaces Domain.spawn with Domainslib.Task.async and Domain.join with Domainslib.Task.await. The approach implies to setup a domain pool and pass it to the tests. I have edited src/array/{lin,stm}_tests.ml to reflect this.

The results are rather encouraging: src/array/lin_tests.ml is cut down from about 4.5 s to 0.3 s! (With extremely far off statistical outliers due to bad luck in interleaving search.) However, this is a favourable case as the test’s commands are very cheap to run. The same comparison on my extensive Dynarray STM parallel test gives 24 s vs 12 s for the version with domain reuse, so merely a 2x speedup.

If depending on Domainslib is an issue, the mechanism can be reimplemented using atomics, mutexes and condition variables easily enough. (A reimplementation would likely squeeze some more performance out by not requiring a work-stealing queue.)

It looks like about 20 % of the time is now taken by Sys.cpu_relax, so further improvement may be possible still.

OlivierNicole avatar Apr 26 '24 09:04 OlivierNicole