multicoretests
                                
                                
                                
                                    multicoretests copied to clipboard
                            
                            
                            
                        Measure statistical significance of Domain setup
There's one remaining usage of cpu_relax in spinning the first domain while waiting for the second domain to start-up:
https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/lib/lin.ml#L122-L124
Now that we have statistics in place, it would be natural to give this Domain setup a run-down to see what aspects actually influence the bug-finding ability similar to what I did for Thread recently: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/statistics/README.md?plain=1#L129-L143
For Thread a wait loop had an significant effect. For Domain it would be nice to confirm - and also investigate whether there could be better ways to accomplish this. In the tests for the work-stealing deque that has now been pulled out of domainslib the spinning did not work at all to trigger issues on MacOSX, so I ended up going with a binary semaphore: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/domainslib/ws_deque_test.ml#L131-L133
The simpler, the better. A combination of a Mutex and a Condition variable may also be sufficient.
Originally posted by @jmid in https://github.com/jmid/multicoretests/issues/43#issuecomment-1099991569
I've been trying to have some numbers comparing bug-triggering with cpu_relax and semaphore. I have some strange results (no buggy programs found over 10000 while CI is happy with 1000...) and I don't understand yet, but it seems that synchronization with a semaphore is a bit faster than with a cpu_relax:
$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 138767447
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000   103.1s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000    78.5s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 0 / 10000
semap : 0 / 10000
Code is here: https://github.com/n-osborne/multicoretests/blob/domain-stats/src/neg_tests/conclist_stm_tests.ml#L53 and here: https://github.com/n-osborne/multicoretests/blob/domain-stats/lib/STM.ml#L391
That's indeed interesting that the Semaphore is faster than the "Atomic waiting loop" :+1: :thinking:
I had a quick look:
- When an exception is raised 
mk_propdoes not increase the counter (I think it should) - I also noticed that the stats tests are not using 
repeat. To be comparable to the CI's 1000 iterations I would try to use it here too. 
* When an exception is raised `mk_prop` does not increase the counter (I think it should)
Yes, that works better that way.
* I also noticed that the stats tests are not using `repeat`. To be comparable to the CI's 1000 iterations I would try to use it here too.
That was just to have something a bit more accurate for speed.
So Semaphore are indeed faster, but spot far less buggy programs:
This is with repeat 25 prop.
$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 300478220
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000  3302.4s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000  1970.2s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 36868 / 10000
semap : 8 / 10000
                                    
                                    
                                    
                                
Ah, that is indeed quite a difference! :open_mouth:
i'm surprised by the number 36868 though!
Because of the way Util.repeat is implemented it should stop early on the first failed property.
I would thus expect it to increment the counter at most once for each of the 25 repetitions and hence reach at most 10000. :thinking: