Concuerror icon indicating copy to clipboard operation
Concuerror copied to clipboard

Create an API for storing user data in the Concuerror state

Open k32 opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe.

For performance reasons it would be great to have an API for maintaining a user-provided global state

  1. Updating this state from the instrumented processes should not be considered a race condition, and it should not trigger exploration of the additional interleavings
  2. At the same time, this state should be maintained as part of the instrumented system

Describe the solution you'd like

  1. Expand the #trace_state record with a new user_state field (I could be mistaken if it's the right place, though)
  2. Create a magic function that works like this:
%% Somewhere in the testcase...
ReturnValue = concuerror:update_user_state(fun(OldState) -> do_stuff(), {ReturnValue, NewState} end)

Describe alternatives you've considered Hide the state in a module that is not instrumented by Concuerror. This doesn't seem to work correctly.

Additional context

We've been experimenting with a special style of testcases that heavily rely on the inspection of the system's execution trace (https://github.com/kafka4beam/snabbkaffe). Our library is extremely naive: it intercepts structured log messages from the system while it is runs, it forwards them to a collector process, which later dumps the event trace, so it can be checked for any desired properties (e.g. https://github.com/emqx/ekka/blob/master/test/ekka_rlog_props.erl#L41). This approach proved to be quite elegant in some cases where we're dealing with eventually consistent systems that can restart and failover. Unfortunately, when snabbkaffe library runs under Concuerror, the collector process creates a lot of unnecessary interleavings, so much so it renders the whole snabbkaffe+concuerror combination impractical. I wonder if it is possible to move snabbkaffe's internal state from a separate process to the concuerror's internal state.

k32 avatar Jul 17 '21 22:07 k32

Hi @k32 !

This is a reasonable proposal, and one of my own "headaches" too: making it easier to "hide" "benign" racy operations from Concuerror, and still use them to control a test's scheduling. I am also curious about how the exclude_module option is failing in such a scenario.

I want to explore this more, so I think a good way to start is to have a small example of a snabbkaffe use that highlights the problem. Something like a snabbkaffe test case, together with a way to invoke Concuerror on it should be good enogh.

Is this something that you can send me?

aronisstav avatar Jul 22 '21 06:07 aronisstav

I want to explore this more, so I think a good way to start is to have a small example of a snabbkaffe use that highlights the problem. Something like a snabbkaffe test case, together with a way to invoke Concuerror on it should be good enogh.

Thanks for the answer! We have a small testsuite where snabbkaffe runs under concuerror. Consider the following test for example:

https://github.com/kafka4beam/snabbkaffe/blob/master/test/concuerror_tests.erl#L14

It spawns three processes: the first one waits for a ping message. The other two compete to send the message to the first one. Once the first process receives a message, it produces pong trace event. The main process of the testcase waits for the pong event. There is a lot of preprocessor trickery going on in ?block_until macro, but this is what it essentially does: it constructs a predicate fun matching the event, then it sends the fun to snabbkaffe gen_server, which uses it to match the past and the incoming events. Once it finds the event, it replies back.

Currently all the snabbkaffe processes are instrumented by Concuerror, and the testcase works as I expect: there is always ping/pong pair of events in the trace. However, it breaks when I exclude snabbkaffe module here: https://github.com/kafka4beam/snabbkaffe/blob/master/Makefile#L5

* Error: A process (<0.106.0>) took more than 5000ms to report a built-in event. You can try to increase the '--timeout' limit and/or ensure that there are no infinite loops in your test.

This could be a minor issue in the snabbkaffe code, but I suspect that the problem may be more fundamental: events from the different runs of the instrumented code may all mix up in the snabbkaffe's trace. However, I can only speculate that it can happen.

k32 avatar Jul 23 '21 21:07 k32