multicoretests icon indicating copy to clipboard operation
multicoretests copied to clipboard

Support multiple ts in Lin and Lin_api signatures

Open jmid opened this issue 3 years ago • 23 comments

This PR takes a stab at supporting multiple ts in the signature descriptions of Lin and Lin_api, thus fixing the Lin part of #62.

Overall, it supports multiple ts by using an underlying t array:

  • Initially the init result goes to index 0
  • New ts are always saved to fresh subsequent indices
  • The two spawned Domains can refer to ts from the sequential prefix and their own produced ts. This is achieved through an environment-based generation and thus should avoid race-conditions.
  • Note: the array is initialized to have all entries point to the same initial init result

The different ts are distinguished as variables, encapsulated with relevant functions in a Var module. The environment handling is similarly encapsulated in an Env module. To make it clear what variable t0 refers to, we include an initial let t0 = init () in the printed cmd triples.

The PR is probably best read in steps, mirroring its development:

  1. getting it to work for the raw Lin interface - this required changing the API
    • here gen_cmd is extended to take a variable generator as argument and
    • gen_cmd should return a generator of pairs, extended with a first component Var.t option to signal whether to save a t and if so, the variable to hold it. A side-effecting, gen-sym-like Var.next () is available to generate fresh variables
    • the Var.t option is also used to indicate which ts need cleanup (this should avoid double cleanup)
  2. getting it to work for the combinator-based Lin_api interface
    • here FnState is extended to carry a Var.t variable for identifying the desired state.
    • since we don't want to compare newly created ts for equality (only store them), they should use the [returning_] and [returning_or_exc_] combinators (note the final underscore _)
  3. updating all the lin_test.ml examples broken by the API change
    • with the gen_cmd signature change, this means we can no longer benefit from ppx_deriving_qcheck as it produces an unparameterized generator.
  4. in addition some lin_test_dsl.ml examples are extended to cover a larger subset of the tested API. Locally the extended Bytes test now triggers a segfault...

jmid avatar Aug 16 '22 15:08 jmid

Rebased on latest main. Latest changes comment out the unsafe Bytes.escaped and include increasing the chance of producing counterexamples to sequential consistency.

The PR is still missing a way to shrink left-hand-sides of t-producing cmds: let t5 = Bytes.make 27 'z'.

jmid avatar Aug 19 '22 08:08 jmid

Just rebased on latest main

jmid avatar Sep 01 '22 16:09 jmid

5/8 CI runs fail to trigger the Bytes failure within the 1000 iterations, which makes the CI turn red.

jmid avatar Sep 02 '22 12:09 jmid

Rebased on latest main

jmid avatar Sep 22 '22 07:09 jmid

Rebased on latest main

jmid avatar Sep 22 '22 16:09 jmid

Rebased on latest main to try it out on 5.0.0~beta1 and with a bytecode CI target

jmid avatar Oct 13 '22 17:10 jmid

CI is all red, so I've spent some time trying to understand why:

  • On Linux it generally fails because the Lin_api Bytes issue is not triggered
  • On MacOSX it generally fails because the Lin_api Bigarray issue not triggered

I think we need to better understand why this PR no longer triggers these issues and address that before merging.

A few other things:

In one case on MacOSX there was a Lin_api Array test with Domain with crazy many reduction attempts (114580!): https://github.com/jmid/multicoretests/actions/runs/3245274353/jobs/5322586745 Here we probably need to use a better t_var shrinking strategy (bisection?) to limit the number of attempted var_fixes?

Finally, I realized that returning_ described as [returning comb] indicates the return type [comb] which is ignored. actually doesn't ignore a t, but saves it. This is clearly a brain-fart on my part that needs to be addressed somehow :shrug:

jmid avatar Oct 14 '22 11:10 jmid

I wonder whether some recent modifications of the compiler might not be making some expected-to-fail tests require more runs to appear. To try and start testing that hypothesis, I just opened PR #157 (based on main).

shym avatar Oct 14 '22 13:10 shym

I also think “ignored” is confusing. I suppose you meant something like “will not be used to check consistency between linearized and non-linearized runs”, didn’t you? Part of the confusion is the fact that results of type “t” always fall into that category, which should be explicitly stated.

shym avatar Oct 14 '22 13:10 shym

I wonder whether some recent modifications of the compiler might not be making some expected-to-fail tests require more runs to appear. To try and start testing that hypothesis, I just opened PR #157 (based on main).

I think part of the reason may be

  • we are going from testing 6 to 14 type signatures in src/bytes/lin_tests_dsl.ml for example. This should decrease the chance of generating a random test that combines "the right ones".
  • of these 14 signatures 5 return a new t. This further increases the chance of triggering issues that arise from simultaneous mutation of the same t.

I've tried to counter these by playing with weights. This does not solve the issue entirely, so other things may play a role too, e.g., fetching and storing from a global array of ts may take more time than when we only has a single global t, thus affecting our timing of things on top of each other :thinking:

P.s. I also spotted a duplicate Bytes.make in src/bytes/lin_tests_dsl.ml...

jmid avatar Oct 14 '22 14:10 jmid

I realized that reducing the init size in src/bytes/lin_tests_dsl.ml from 42 to 8 gave a nice output, but unexpectedly reduced our ability to trigger the Bytes issue. I have therefore restored it to 42...

jmid avatar Oct 14 '22 15:10 jmid

This is improving as it no longer all red! :smiley:

There are still a few outstanding issues:

  • three MacOSX runs are not triggering the Lin_api BigArray1 issue
  • a MacOSX run is timing out due to excessive shrinking on Lin_api Bytes https://github.com/jmid/multicoretests/actions/runs/3259689990/jobs/5352706913
  • a bytecode run is timing out due to excessive shrinking on Lin_api Ephemeron

Less serious:

  • two bytecode runs are failing to trigger int64 ref Thread (like main)

jmid avatar Oct 16 '22 21:10 jmid

Regarding Bigarray failures on macos, I opened #160 and #161, the combination of which I’m testing on my integration branch. They give reproducibility in my first rounds of testing. #161 should also address, somehow, the less serious int64 ref Thread, sometimes trading failure to fail for timeouts, unfortunately.

shym avatar Oct 19 '22 15:10 shym

Rebased on latest main to pick up the combined effect of #160 (on multiple-ts) and #161 (on main)

jmid avatar Oct 20 '22 07:10 jmid

As main and multiple-ts contain naturally conflicting changes, I find it makes more sense (at least to me) to merge instead of rebasing. That’s why I created my integration branch, especially that commit. I’m not sure github UI can show how conflicts were resolved, like a git show --cc aae766b though.

shym avatar Oct 20 '22 13:10 shym

As main and multiple-ts contain naturally conflicting changes, I find it makes more sense (at least to me) to merge instead of rebasing.

Not sure I follow you here. There will be conflicts to manually resolve when either merging or rebasing, no? In the above case, I messed up when manually resolving the rebasing conflicts - which I could just as well have done while resolving a merge conflict. :thinking:

jmid avatar Oct 20 '22 20:10 jmid

That’s probably a matter of preference, but by merging instead of rebasing, while the branch is still WIP, you record a trace of how conflicts were resolved. When the branch is ready I also tend to rebase it (if only to clean it up).

shym avatar Oct 21 '22 17:10 shym

In 5e460ad I've now tried to rephrase the intended semantics of the _returning combinators.

Overall this feature is starting to shape up nicely - thanks for helping out @shym!

For now, I want to let the CI complete a run. I seem to recall having seen a CI log spending needlessly long time shrinking - which I suspect may be due to the added t_var shrink renaming. After the CI run completes, I therefore intend to go over the CI logs with a comb, to see whether this is an issue.

jmid avatar Oct 28 '22 08:10 jmid

I've now taken a first stab at merging main into this one. It will require some more polishing and revision, but I wanted to throw it after the CI targets to test it out. Locally, it seemed to not trigger the expected parallel Dynlink failure.

jmid avatar Feb 02 '23 12:02 jmid

To summarize the latest CI run:

  • 3 threadomain failures - 1 Windows bytecode crash + 2 Windows timeouts (trunk + 5.0.0 bytecode)
  • 2 Windows Dynlink crashes (5.0.0 + trunk)
  • 1 Bigarray test failure not triggering an expected issue on macOS trunk
  • 1 Lin Weak Hashset test timing out due to excessive shrinking (17766.5 secs!) on Windows 5.0.0

jmid avatar Feb 03 '23 11:02 jmid

a1dc261 offers a simplification of the Lin* interfaces by defining a cmd_triple record type.

jmid avatar Feb 27 '23 15:02 jmid

Resolved conflicts from merging main into this one

jmid avatar Mar 07 '23 09:03 jmid

The test suite is failing in a number of different ways with the current PR. I tested on a MacMini, where increasing the frequency of Bigarray.Array.get makes the expected failure trigger more consistently. A number of other failures however still needs to be addressed.

jmid avatar Mar 10 '23 09:03 jmid