multicoretests
multicoretests copied to clipboard
Support multiple ts in Lin and Lin_api signatures
This PR takes a stab at supporting multiple ts in the signature descriptions of Lin and Lin_api, thus fixing the Lin part of #62.
Overall, it supports multiple ts by using an underlying t array:
- Initially the
initresult goes to index 0 - New
ts are always saved to fresh subsequent indices - The two
spawnedDomains can refer tots from the sequential prefix and their own producedts. This is achieved through an environment-based generation and thus should avoid race-conditions. - Note: the array is initialized to have all entries point to the same initial
initresult
The different ts are distinguished as variables, encapsulated with relevant functions in a Var module.
The environment handling is similarly encapsulated in an Env module.
To make it clear what variable t0 refers to, we include an initial let t0 = init () in the printed cmd triples.
The PR is probably best read in steps, mirroring its development:
- getting it to work for the raw
Lininterface - this required changing the API- here
gen_cmdis extended to take a variable generator as argument and gen_cmdshould return a generator of pairs, extended with a first componentVar.t optionto signal whether to save atand if so, the variable to hold it. A side-effecting, gen-sym-likeVar.next ()is available to generate fresh variables- the
Var.t optionis also used to indicate whichts needcleanup(this should avoid doublecleanup)
- here
- getting it to work for the combinator-based
Lin_apiinterface- here
FnStateis extended to carry aVar.tvariable for identifying the desired state. - since we don't want to compare newly created
ts for equality (only store them), they should use the [returning_] and [returning_or_exc_] combinators (note the final underscore_)
- here
- updating all the
lin_test.mlexamples broken by the API change- with the
gen_cmdsignature change, this means we can no longer benefit fromppx_deriving_qcheckas it produces an unparameterized generator.
- with the
- in addition some
lin_test_dsl.mlexamples are extended to cover a larger subset of the tested API. Locally the extendedBytestest now triggers a segfault...
Rebased on latest main.
Latest changes comment out the unsafe Bytes.escaped and include increasing the chance of producing counterexamples to sequential consistency.
The PR is still missing a way to shrink left-hand-sides of t-producing cmds: let t5 = Bytes.make 27 'z'.
Just rebased on latest main
5/8 CI runs fail to trigger the Bytes failure within the 1000 iterations, which makes the CI turn red.
Rebased on latest main
Rebased on latest main
Rebased on latest main to try it out on 5.0.0~beta1 and with a bytecode CI target
CI is all red, so I've spent some time trying to understand why:
- On Linux it generally fails because the
Lin_apiBytesissue is not triggered - On MacOSX it generally fails because the
Lin_apiBigarrayissue not triggered
I think we need to better understand why this PR no longer triggers these issues and address that before merging.
A few other things:
In one case on MacOSX there was a Lin_api Array test with Domain with crazy many reduction attempts (114580!):
https://github.com/jmid/multicoretests/actions/runs/3245274353/jobs/5322586745
Here we probably need to use a better t_var shrinking strategy (bisection?) to limit the number of attempted var_fixes?
Finally, I realized that returning_ described as [returning comb] indicates the return type [comb] which is ignored.
actually doesn't ignore a t, but saves it. This is clearly a brain-fart on my part that needs to be addressed somehow :shrug:
I wonder whether some recent modifications of the compiler might not be making some expected-to-fail tests require more runs to appear. To try and start testing that hypothesis, I just opened PR #157 (based on main).
I also think “ignored” is confusing. I suppose you meant something like “will not be used to check consistency between linearized and non-linearized runs”, didn’t you? Part of the confusion is the fact that results of type “t” always fall into that category, which should be explicitly stated.
I wonder whether some recent modifications of the compiler might not be making some expected-to-fail tests require more runs to appear. To try and start testing that hypothesis, I just opened PR #157 (based on
main).
I think part of the reason may be
- we are going from testing 6 to 14 type signatures in
src/bytes/lin_tests_dsl.mlfor example. This should decrease the chance of generating a random test that combines "the right ones". - of these 14 signatures 5 return a new
t. This further increases the chance of triggering issues that arise from simultaneous mutation of the samet.
I've tried to counter these by playing with weights. This does not solve the issue entirely, so other things may play a role too, e.g., fetching and storing from a global array of ts may take more time than when we only has a single global t, thus affecting our timing of things on top of each other :thinking:
P.s. I also spotted a duplicate Bytes.make in src/bytes/lin_tests_dsl.ml...
I realized that reducing the init size in src/bytes/lin_tests_dsl.ml from 42 to 8 gave a nice output, but unexpectedly reduced our ability to trigger the Bytes issue. I have therefore restored it to 42...
This is improving as it no longer all red! :smiley:
There are still a few outstanding issues:
- three MacOSX runs are not triggering the
Lin_apiBigArray1issue - a MacOSX run is timing out due to excessive shrinking on Lin_api Bytes https://github.com/jmid/multicoretests/actions/runs/3259689990/jobs/5352706913
- a bytecode run is timing out due to excessive shrinking on Lin_api Ephemeron
Less serious:
- two bytecode runs are failing to trigger int64 ref Thread (like
main)
Regarding Bigarray failures on macos, I opened #160 and #161, the combination of which I’m testing on my integration branch. They give reproducibility in my first rounds of testing.
#161 should also address, somehow, the less serious int64 ref Thread, sometimes trading failure to fail for timeouts, unfortunately.
Rebased on latest main to pick up the combined effect of #160 (on multiple-ts) and #161 (on main)
As main and multiple-ts contain naturally conflicting changes, I find it makes more sense (at least to me) to merge instead of rebasing. That’s why I created my integration branch, especially that commit. I’m not sure github UI can show how conflicts were resolved, like a git show --cc aae766b though.
As
mainandmultiple-tscontain naturally conflicting changes, I find it makes more sense (at least to me) to merge instead of rebasing.
Not sure I follow you here. There will be conflicts to manually resolve when either merging or rebasing, no? In the above case, I messed up when manually resolving the rebasing conflicts - which I could just as well have done while resolving a merge conflict. :thinking:
That’s probably a matter of preference, but by merging instead of rebasing, while the branch is still WIP, you record a trace of how conflicts were resolved. When the branch is ready I also tend to rebase it (if only to clean it up).
In 5e460ad I've now tried to rephrase the intended semantics of the _returning combinators.
Overall this feature is starting to shape up nicely - thanks for helping out @shym!
For now, I want to let the CI complete a run.
I seem to recall having seen a CI log spending needlessly long time shrinking - which I suspect may be due to the added t_var shrink renaming. After the CI run completes, I therefore intend to go over the CI logs with a comb, to see whether this is an issue.
I've now taken a first stab at merging main into this one.
It will require some more polishing and revision, but I wanted to throw it after the CI targets to test it out.
Locally, it seemed to not trigger the expected parallel Dynlink failure.
To summarize the latest CI run:
- 3 threadomain failures - 1 Windows bytecode crash + 2 Windows timeouts (trunk + 5.0.0 bytecode)
- 2 Windows
Dynlinkcrashes (5.0.0 + trunk) - 1
Bigarraytest failure not triggering an expected issue on macOStrunk - 1
Lin Weak Hashsettest timing out due to excessive shrinking (17766.5 secs!) on Windows 5.0.0
a1dc261 offers a simplification of the Lin* interfaces by defining a cmd_triple record type.
Resolved conflicts from merging main into this one
The test suite is failing in a number of different ways with the current PR.
I tested on a MacMini, where increasing the frequency of Bigarray.Array.get makes the expected failure trigger more consistently. A number of other failures however still needs to be addressed.