dscheck icon indicating copy to clipboard operation
dscheck copied to clipboard

Randomness

Open bartoszmodelski opened this issue 1 year ago • 1 comments

Many lock-free implementations use some randomness as an easy way to load balance without coordination. This may be a source of nondeterminism and DSCheck does not like it. For example, say we have a test that begins with two threads wanting to put two elements into random slots in an array.

Run:

  1. Thread A draws slot 3 and puts item there
  2. Thread B draws slot 3 and fails
  3. Thread B retries; draws slot 5 and puts item there .. .. more steps

If DSCheck finds some races after step 3, it will keep rerunning the sequence 1-3 to explore events of interest that occur further down the sequence. But on the consecutive runs, thread B may succeed on the first try, yielding a different state. In the positive case, B will become disabled before its schedule and DSCheck crashes explicitly. In the more scary one, we may give a positive result without actually having explored all the interesting interleavings.

It's an easy mistake to make and a pretty difficult one to find. Perhaps, DSCheck should choose some random seed, and then reinitialize OCaml's prng to this value at the beginning of every run?

cc @lyrm, who's observed this issue in pratice, with dscheck crashing uncontrollably on https://github.com/ocaml-multicore/lockfree/pull/65

bartoszmodelski avatar Apr 14 '23 10:04 bartoszmodelski