trafficstars

Implements runParallelActions which will spawn multiple threads to test parallel execution in a style similar to eqc_statem.

Apr 02 '24 12:04 jasagredo

Generation of parallel actions

The way I'm generating parallel actions is somewhat naive. I generate a list of normal Actions and then I split the list into a sequential prefix and some (currently 2) parallel-safe suffixes.

I check that every possible interleaving of actions in those parallel suffixes preserves every precondition.

However Actions_ has two things I don't see how to fit here:

The rejected actions, what should I put there in the parallel suffixes?
The number for the Smart constructor, is 0 okay?

Also if generation fails, I just try again.

I also currently don't have a shrinker :(

Combination of environments

When I'm generating all possible interleavings of executions, I can for example face this parallel sequence with the noted results after execution:

         /- [D,E,F] --- returns ([vA,vB,vC,vD,vE,vF], [rD,rE,rF])
[A,B,C] -|
         \- [G,H,I] --- returns ([vA,vB,vC,vG,vH,vI], [rG,rH,rI])

For now I'm concat'ing all the environments because I assume the LookUp can deal with duplicates? They will still be uniquely identified.

`perform` can't have access to the state

In parallel execution, perform will not be able to access any state because we don't advance the state at the same time we run the actions, so I had to add perform' which doesn't have that first parameter.

Checking all interleavings kills the machine

If we have not small suffixes, all possible interleavings are just too many, the number of interleavings rises very very fast.

ghci> map (\x -> let a = [1 :: Int ..x] in length $ interleaves [a, a]) [1..10]
[2,6,20,70,252,924,3432,12870,48620,184756]

Perhaps we should limit the number of parallel actions to a reasonable small number?

2 or more threads?

eqc_statem generates a list of lists of actions, where each list of actions will be in a separate process. Should we support this too or is 2 threads enough? It will also worsen the problem above.

map (\x -> let a = [1 :: Int ..x] in length $ interleaves [a, a, a]) [1..10]
[6,90,1680,34650,756756,17153136,killed

Apr 02 '24 15:04 jasagredo

@jasagredo I have started having a look and playing with your PR, which lead me to investigate how this feature was implemented in q-s-m.

As written, the parallel test does not terminate because of the combinatorial explosion of interleavings. Empirically I have found the limit to be around 12-13, and managed to make the test pass by using a sized generator and reducing the length of generated actions sequence to the square root of the size parameter
In q-s-m, this interleaving computation is limited to sequence of at most 5 actions, but applied on successive segments on the initial sequence, which makes it possible to test longer sequences

Perhaps a good first approach would be to try to port the q-s-m code?

Apr 06 '24 11:04 abailly-iohk

A comment on the number of parallel threads: some of the most interesting bugs I've found have required three threads--see https://dl.acm.org/doi/abs/10.1145/2034654.2034667 for example. So I think it would be a real shame to restrict parallel tests to two threads. I can imagine adding a Fork action (with a list of actions as a parameter) to express an arbitrary number of threads. If Fork returns a thread ID, one could even imagine Join ThreadId actions that wait for a forked thread to terminate. The nice thing about adding Fork and Join actions is that they could be used in DL tests too: imagine a DL test of the form anyActions; Fork anyActions; <some DL sequence>, for example, that would test that the DL sequence worked in any reachable state, with anything going on in parallel. There is a potential problem with a combinatorial explosion of interleavings, but that can be addressed during generation just by not generating test cases with too many interleavings--the number of interleavings is easily (and cheaply) predictable from a test case, so one can just stop generating before the number gets too big (e.g. 100 or 1000). e.g. Fork [a,b]; Fork [c,d]; e; f has (6 choose 2) * (4 choose 2) interleavings, i.e. 15*6 or 90. I don't think Fork poses any real problems here, although I haven't figured out how to handle Join in detail.

Shrinking is really important, of course, and there are useful shrinking steps for parallel tests (in addition to shrinking or just dropping actions) that reduce parallelism, making failed tests easier to debug. One: move the first action out of a Fork. Two: move a Fork later (past the next non-Fork action, to ensure we reduce the set of possible interleavings, which is necessary to avoid shrinking loops).

Apr 15 '24 13:04 rjmh

There's a problem as is with the type of postconditions (PostconditionM). In this version, postconditions can interact with the system under test, and indeed, this is used in several places in IOG's code to read from the system state in the postcondition. But in parallel testing, postconditions cannot be evaluated while the test is run, instead one must explore many possible interleavings after the test has finished, which means each postcondition may run many times, and in the final state of the test, not in the state after the action it applies to. If model authors expect the postcondition to read form the state just after that action, then this will lead tests to fail--in ways which developers find very difficult to understand. Parallel testing really needs pure boolean postconditions--any interaction with the system under test is likely to be broken, and even monitoring information will differ for each interleaving, making counterExample in particular almost useless (which failed interleaving should you be given counterexample information about, when they all failed?).

To make this work I think one needs a separate RunModel' class whose postconditions have a simpler type, which could be used for both sequential and parallel testing, while the existing RunModel class can be used only for sequential testing. (If it's used for parallel testing, then a trap is laid for developers who talk to the SUT in postconditions, and it would be nice to exclude that possibility using types). Working with such a class does require modelling more of the SUT... in order to write the postconditions, one needs to model the information in the SUT that was previously read from it, so that it is available in the postcondition when it is needed. So there's a risk users might continue to use the current RunModel class, because it simplifies modelling, but that would make parallel testing impossible. It's a bit Catch 22--what makes parallel testing so nice in e.g. eqc_statem is that it is so easy to turn it on, using the same model used for sequential testing; if one has to enrich one's model and rewrite one's postconditions, then people are much less likely to use it.

Apr 15 '24 14:04 rjmh

@rjmh Do I understand correctly that in Erlang one defines postconditions in the "model" (StateModel in our case) and therefore it's pretty straightforward to run in both sequential and parallel settings? Interestingly, in the work done by @MaximilianAlgehed and @UlfNorell on Peras, they introduced this step : State -> Action -> Maybe (State, Result) function on the Agda side that's mapped to the nextState, precondition, and postcondition functions very straigtforwardly, assuming the postcondition to be simply equality between expected and actual results. It seems to me it would not be too hard to define a PostCondition a language, much like the typical assertEquals, assertContains from example-based testing, that would make it possible to move back the postcondition to the StateModel.

Apr 15 '24 14:04 abailly-iohk

The issue with pure postconditions is that you either end up:

Making the result type of actions way too complex,
Introducing ancillary actions with complex preconditions that force a bunch of annoying book-keeping in the model state, or
Give up on checking things that you should have checked.

I don't remember the specific case that forced us to make postconditions monadic, but it was the least bad solution to avoiding the issues above at the time.

Apr 15 '24 15:04 MaximilianAlgehed

In the Java world I used to worked in, there was a cottage industry of "assertion libraries" (eg. hamcrest) which, when they came out, improved the legibility of our unit tests. Some Haskell test frameworks (eg. hspec) have specific functions for testing expectations.

I may be wrong but it seems something like the following would work:

class StateModel  m where
   data Action m a = ...
   
   postcondition :: m -> Action m a -> PostCondition a

where (roughly)

data Postcondition a = 
    Eq a => Equal a | Contains [a] | Ord a => Compare a | ...

would define a pure language of assertions.

Then the framework would be responsible for checking the postconditions from actual results.

Apr 15 '24 15:04 abailly-iohk

@abailly-iohk It seems you are almost talking about https://hackage.haskell.org/package/quickcheck-state-machine-0.9.0/docs/Test-StateMachine-Logic.html, right?

Apr 16 '24 10:04 jasagredo

@abailly-iohk It seems you are almost talking about https://hackage.haskell.org/package/quickcheck-state-machine-0.9.0/docs/Test-StateMachine-Logic.html, right?

I wasn't aware of this module, seems like there's more good things to port from q-s-m than just the parallel testing stuff :)

Apr 16 '24 11:04 abailly-iohk

Max wrote:

The issue with pure postconditions is that you either end up:

Making the result type of actions way too complex,

Introducing ancillary actions with complex preconditions that force a bunch of annoying book-keeping in the model state, or

Give up on checking things that you should have checked.

I think the issue is that when you want to write postconditions that relate the result of an action to the system state, you can either make your model detailed enough to predict that state (that's Max's annoying book-keeping), or read it directly from the system state--which you can either do in a monadic postcondition, or via a special action (Max's ancillary actions), or via reading them in the same call of perform, which means returning extra information in the action result type. And the general problem with reading from the system state, however you do it, is that those reads are hard to do atomically with the action they apply to. In parallel testing, the postconditions are checked after the entire test runs, so they simply read the wrong system state, while if the reads are done in a custom action or in perform, they will not be done atomically with the underlying action, and thus risk reading the wrong system state. One might perhaps wrap a lock around the underlying action and the associated reads inside perform to make everything atomic, but that defeats the purpose of parallel testing, which is to test that the action behaves atomically without additional synchronization.

So basically, parallel testing is incompatible with reading extra information from the system state to write postconditions, however that is done. The only way to make parallel testing work is to enrich the model to predict (enough of) that information. Sometimes this will mean complicating the model considerably. And for sequential testing, it's not necessary.

There is a lot to be said for making the shift from sequential to parallel testing easy, though--partly, it makes it much more likely that people will do it, than if they have to rewrite all their postconditions with a different type. Secondly, it allows the model to be debugged using sequential tests, which is far simpler and quicker than debugging using parallel tests.

Here's an idea: one might allow monadic postconditions, but interpret any postcondition that actually invokes an underlying monadic operation as 'True', in a parallel test. That is, one would automatically remove those checks during parallel testing. One would want a compositional way of doing this, so that a postcondition might mix purely-functional checks (which can still fail) with monadic checks (which are just assumed to pass). It would be desirable to generate some kind of warning when this happens (e.g. X% of postconditions could not be checked). But it would allow a sequential model to be used immediately for parallel testing, which might already reveal many bugs even with the weaker postconditions, while supporting a gradual enrichment of the model (to reduce the "postcondition could not be checked" percentage).

What doesn't work well is to allow monadic postconditions and just check them at the wrong time... that leads to very hard to debug false positives, and there's no better way to put people off parallel testing.

Apr 17 '24 14:04 rjmh

@rjmh Thanks for the illuminating comment. Reading this

one might allow monadic postconditions, but interpret any postcondition that actually invokes an underlying monadic operation as 'True', in a parallel test.

seems to me advocating for a dedicated language to express conditions, where one of the constructors would allow monadic actions to occur, e.g much like the Rose type in QC which can hold pure values a or some monadic way for returning a Rose a? From my previous example:

data Postcondition m a = AssertEq a | ... | PostMonadic (m (Postcondition m a))

Apr 17 '24 14:04 abailly-iohk

seems to me advocating for a dedicated language to express conditions, where one of the constructors would allow monadic actions to occur

data Postcondition m a = AssertEq a | ... | PostMonadic (m (Postcondition m a))

Not exactly. I think there's a lot of value in allowing arbitrary boolean expressions in postconditions... otherwise there will be a constant pressure to make the postcondition language richer. But I can imagine

data Postcondition m a = PostBool Bool 
                                        | PostMonitor (Property->Property) (Postcondition m a)
                                        | PostMonadic (m (Postcondition m a))

where the Applicative instance would delay PostMonadic until after all the boolean checks and monitors. One could invoke the PostMonadic constructor via some offputting name, such as 'inSequentialTests', thus making it more obvious to the developer that there is a cost in using such things.

It's worth thinking about how to do the monitoring too. In failing parallel tests, after shrinking, the parallel parts are usually quite small and so there may be rather few interleavings. It may also be that the interleavings that fail last are the only interesting ones--ones that fail earlier can be considered as failing "because the interleaver guessed wrong". So potentially one might try and report each maximal failing interleaving separately, with counterexample information for each one.

Collecting statistics is a bit more difficult, since there may be many successful interleavings and we have no reason to prefer statistics from one over those from another. It feels wrong to collect statistics from all successful interleavings, because that would give some tests exponentially more weight than others in the overall statistics. Also it would be expensive... when many interleavings succeed, then there may be a big saving in only checking the first one.

Apr 17 '24 15:04 rjmh

I do agree there's value in allowing arbitrary boolean expressions in postconditions, but I also do think there's value in providing some kind of DSL for the common cases as this alleviates the need to for the user to write both the assertion/predicate and the counterexample explaining the predicate for the usual cases of (in)equality testing comparable values: both can be handled directly by the test driver.

Apr 17 '24 16:04 abailly-iohk

Right, like (===) in the Property type. But that doesn't require a special constructor in the Postcondition type, just a library of functions for expressing checks-with-error-messages. For example, there is no special representation for (===) properties, but you get the behaviour you want anyway.

Apr 18 '24 08:04 rjmh

I agree this could be done as functions. With a dedicated language, you get some more possibilities like more compact representations for combinations. That's perhaps overkill 🤷

Apr 18 '24 09:04 abailly-iohk

To summarise the above, there seems to be a couple of issues with the current definition of RunModel:

The type of perform is:

perform :: Typeable a => state -> Action state a -> LookUp -> m (PerformResult (Error state) a)

This prevents using this function in parallel testing as the state is not evolved during the execution. This function should not have access to the state:

perform :: Typeable a => Action state a -> LookUp -> m (PerformResult (Error state) a)

Furthermore, I'm dubious about the LookUp argument. It is unclear how results from previous commands would be of importance in the current performed action and not be already tracked by the SUT. In eqc_statem, COMMAND_args generate symbolic arguments during test-generation but those symbolic arguments are converted to dynamic ones on test execution, therefore calling COMMAND/N with real arguments. I would actually argue that the type of perform should be:
```
perform :: Typeable a => Action state a -> m (PerformResult (Error state) a)
```
But that would require to replace any Var a in Action state a with the actual value. Perhaps Action state a should get another type parameter along the lines of:
```
data Nature = Symbolic | Dynamic -- Nature is probably not the right term

data ActionVar n a where
  SymbolicVar :: Var a -> ActionVar Symbolic a
  DynamicVar  ::     a -> ActionVar Dynamic  a
```
Should then perform be:
```
perform :: Typeable a => Action state Dynamic a -> m (PerformResult (Error state) a)
```

The type of postcondition is:

postcondition :: (state, state) -> Action state a -> LookUp -> a -> PostconditionM m Bool

Which is a monad transformer on top of m, therefore allowing the postcondition to run monadic actions, this postcondition should be pure as in parallel testing postconditions are not checked during execution but rather at the end of the execution:

postcondition :: (state, state) -> Action state a -> LookUp -> a -> Bool

which would be in-line with

COMMAND_post(S::dynamic_state(), Args::[term()], R::term()) :: bool()

or as suggested, it could carry two parts, a monadic and a pure one. Sequential testing would run both, parallel testing would run only the pure one:

data Postcondition m =
    PostBool Bool
  | PostMonitor (Property -> Property) (Postcondition m)
  | PostMonadic (m (Postcondition m))

postcondition :: (state, state) -> Action state a -> LookUp -> a -> Postcondition m

purePostcondition :: Postcondition m -> PropertyM m' Bool
purePostcondition (PostMonadic   _) = pure True
purePostcondition (PostBool      a) = pure a
purePostcondition (PostMonitor m p) = monitor m >> purePostcondition p

fullPostcondition :: Postcondition m -> PropertyM m Bool
fullPostcondition (PostMonadic   m) = run m >>= fullPostcondition
fullPostcondition (PostBool      a) = pure a
fullPostcondition (PostMonitor m p) = monitor m >> purePostcondition p

Again, postcondition only deals with dynamic states thus the utility of LookUp also looks dubious to me. I would argue that the type should be:
```
postcondition :: state -> Action state Dynamic a -> a -> Postcondition m

purePostcondition :: Postcondition m -> PropertyM m' Bool
fullPostcondition :: Postcondition m -> PropertyM m  Bool
```
where one can use nextState to compute the now missing second component of the first argument.

Having the types above would make much easier to define the infrastructure for parallel testing, plus unifying the interfaces.

Did I get the resulting implications of the discussion above right?

I would also be interested in seeing cases where:

The state is essential in perform
LookUp is essential in perform
Monadic operations are essential for postcondition
LookUp is essential in postcondition

Apr 18 '24 14:04 jasagredo

In the Peras codebase we wrote some postcondition code which fulfills at least the last ~2~ 1 item of your list: We ~need a monadic action to retrieve some underlying state from the SUT, and we~ need Lookup to relate the result of the monadic action to a list of results previously returned which need to be resolved.

This could possibly be expressed differently, for example by adding some "observation" action, perhaps.

https://github.com/input-output-hk/peras-design/blob/main/peras-quickcheck/src/Peras/NetworkModel.hs#L155

postcondition (_, Network{slot}) (ChainsHaveCommonPrefix chainVars) env () = do
 let chains = fmap env chainVars
     prefix = commonPrefix chains
     chainLength = length prefix
     chainDensity :: Integer =
       if slot == 0
         then 0
         else floor @Double $ fromIntegral chainLength * 1000 / fromIntegral slot
 counterexamplePost $ "Chains:  " <> show chains
 counterexamplePost $ "Common prefix:  " <> show prefix
 monitorPost $ tabulate "Prefix length" ["<= " <> show ((chainLength `div` 100 + 1) * 100)]
 monitorPost $ tabulate "Prefix density" ["<= " <> show (chainDensity `div` 10 + 1) <> "%"]
 -- FIXME: this 50 is arbitrary, should be related to some network parameter
 pure $ slot < 50 || not (null prefix)
postcondition _ _ _ _ = pure True

https://github.com/input-output-hk/peras-design/blob/main/peras-quickcheck/src/Peras/NodeModel.hs#L136

 postcondition (_before, NodeModel{slot}) (ForgedBlocksRespectSchedule blockVars) env stakeRatio | slot > 0 = do
  let blocks = foldMap env blockVars
      numberOfBlocks = length blocks
  counterexamplePost $ "Chain: " <> LT.unpack (decodeUtf8 (A.encode blocks))
  produceExpectedNumberOfBlocks stakeRatio numberOfBlocks slot
postcondition _ _ _ _ = pure True

produceExpectedNumberOfBlocks :: Monad m => Rational -> Int -> Slot -> PostconditionM m Bool
produceExpectedNumberOfBlocks stakeRatio blocks slot =  do
  let expectedBP :: Double = fromRational $ stakeRatio * toRational (fromIntegral slot * defaultActiveSlotCoefficient)
      actualBP = fromIntegral blocks
  pure $
    equalsBinomialWithinTails
      (fromIntegral slot) -- The sample size.
      (1 - (1 - defaultActiveSlotCoefficient) ** fromRational stakeRatio) -- Praos probability.
      3  actualBP

In Hydra we don't use Lookup in the postconditions (see https://github.com/input-output-hk/hydra/blob/master/hydra-node/test/Hydra/Model.hs#L490) but we actually check postcondition in the perform with access to the monadic state of the SUT (https://github.com/input-output-hk/hydra/blob/master/hydra-node/test/Hydra/Chain/Direct/TxTraceSpec.hs#L202) which morally is a post-condition and therefore is equivalent to the 3rd requirement in your list.

There's been a discussion about this particular check and how it could have been written using a postcondition function, and this was problematic because the returned value from the action is not what's needed in the postcondition and it was considered cumbersome to change it.

Apr 18 '24 20:04 abailly-iohk

I think for the first one, the "previously seen chains" should be part of the model. Morally I just think that the LookUp is just a fancier way of having a model that records every previous response. Precisely as eqc_statem says:

A correctly written COMMAND_next function does not inspect symbolic inputs--it just includes them as symbolic parts of the result.

And we are doing this all the time just by having an Env and a LookUp.

Apr 22 '24 12:04 jasagredo

@jasagredo exactly how do you propose maintaining the phase separation between the generation time and the runtime if you can't have symbolic variables that are resolved at runtime?

If you don't propose getting rid of the phase distinction then there is fundamentally no difference between having perform be monadic and forcing the poor user to maintain the lookup function themselves, and simply providing the lookup function for them.

If you do propose getting rid of the phase distinction that's a non-starter.

Apr 22 '24 12:04 MaximilianAlgehed

The only difference between eqc and qcd is that in erlang you can easily generically traverse the actions and translate the variables under the hood. In Haskell implementing that is almost impossible to do nicely so you have to make the user do it themselves - hence the lookup.

Apr 22 '24 12:04 MaximilianAlgehed

I am just making explicit that having a LookUp and an Env is probably fine. I'm saying that "morally" that could be part of the model, in the sense that having an Env that automatically records responses and changing model to be (Model, Env) and make that recording explicit are both equivalent approaches.

Apr 22 '24 12:04 jasagredo

I've always liked that quickcheck-dynamic gives you a LookUp, whereas with quickcheck-state-machine you would be writing the same boilerplate code handling symbolic/concrete references every time you set up a new model-based test-suite. Having only symbolic variables also makes the framework easier to understand IMO: variables are always symbolic, but the user can resolve variables to real results if they have a LookUp in scope.

Apr 25 '24 07:04 jorisdral

quickcheck-dynamic
quickcheck-dynamic copied to clipboard

First draft of parallel state machines

Generation of parallel actions

Combination of environments

`perform` can't have access to the state

Checking all interleavings kills the machine

2 or more threads?

quickcheck-dynamic quickcheck-dynamic copied to clipboard

First draft of parallel state machines

Generation of parallel actions

Combination of environments

perform can't have access to the state

Checking all interleavings kills the machine

2 or more threads?

quickcheck-dynamic
quickcheck-dynamic copied to clipboard

`perform` can't have access to the state