picos icon indicating copy to clipboard operation
picos copied to clipboard

More tests for `picos.sync`

Open polytypic opened this issue 10 months ago • 2 comments

I may have observed the picos.sync tests potentially dead/livelocking at least on (32-bit) OCaml 4.14 on CI. This might indicate a bug in the picos.sync library, a bug in the test, a bug in (32-bit) OCaml 4.14 (I don't recall seeing the test not completing on other OCaml versions, but I might have simply missed that), or it might be a completely unrelated thing (test machine being slow for some other reason). At any rate, this needs to be investigated further and the correctness of the picos.sync library implementation ensured.

Observations:

  • debian-12-4.14_arm32_opam-2.1 (not completed after 24+ minutes, completed very quickly after cancel+rebuild)
  • If thread-local-storage is (for some reason) not installed, it was possible, before #110, to build a non-working set of libraries where the mutex cancelation test and benchmarks did not terminate. This shouldn't really be the case with the observed non-completion.
  • Tried running the picos_sync test repeatedly in parallel (dozen or so) with OCaml 4.14.2 on macOS with M1. Did not get any lockups within a few hours.
  • debian-12-4.14_arm64_opam-2.1(not completed in an hour)
  • debian-12-4.14_opam-2.2 (seemed to be stuck in the cancelation test)

polytypic avatar Apr 08 '24 21:04 polytypic

It might be that the issue was related to the cancelation test spawning fibers, which translate to systhreads on OCaml 4. PR #230 changes the tests to not spawn fibers. Time will tell whether this eliminates the hangs on OCaml 4.

Addition: There was a test run where the cancelation test did not seem to complete on 4.14 arm64. Not spawning lots of systhreads seems to have made the failures less common.

polytypic avatar Aug 19 '24 13:08 polytypic

@edwintorok mentioned about the pthread_cond_wait bug, which might be the cause of the issues.

polytypic avatar Aug 20 '24 08:08 polytypic