Concuerror
Concuerror copied to clipboard
Introduce an option similar to --after-timeout for timers
Is your feature request related to a problem? Please describe.
Concuerror already supports treating timeouts in receive/after
statements as infinity
above some threshold with the --after-timeout
option. However, this option does not cover the case of timers started with erlang:start_timer
. My immediate problem is with gen_statem
, that unlike gen_server
uses timers which I cannot prevent from firing in my Concuerror tests.
Describe the solution you'd like
I'd like either the --after-timeout
option to apply to timers started with erlang:start_timer
(that is, timers above the specified threshold shall never fire while running my tests), or I'd like a new option, such as --timer-timeout
to be introduced specifically for timers. Whichever is more convenient to implement.
Describe alternatives you've considered
My only alternative at the moment is to change the code I'm testing to not use finite timeouts with gen_statem
when used from a Concuerror test case. However, this would make the code I test and the code I use in production differ for no good reason.
Additional context N/A
The --after-timeout
applying to timers as well makes perfect sense. Thanks for pointing this out! Will be fixed (soon)!
This could be sufficiently solved by adding a suitable call to
concuerror_inspect:inspect(receive, ..., ignored)
above here https://github.com/parapluu/Concuerror/blob/master/src/concuerror_callback.erl#L899.
I tried to implement this feature according to your advise in b76b9ab, but it doesn't work.
I have a test case that tries all three constructs that should be governed by --after-timeout
: receive
, send_after
and start_timer
. But only receive
works as expected. For the other two cases Concuerror finds a race and then crashes like this:
################################################################################
Interleaving #1
--------------------------------------------------------------------------------
Errors found:
* Blocked at a 'receive' ("deadlocked"; other processes have exited):
<P> in after_timeout.erl line 34
Mailbox contents: []
--------------------------------------------------------------------------------
Event trace:
1: <P>: #Ref<0.0.1.291> = erlang:start_timer(1000, <P>, foo)
in after_timeout.erl line 32
2: <P>: 1 = erlang:cancel_timer(#Ref<0.0.1.291>)
in after_timeout.erl line 33
3: <Timer #Ref<0.0.1.291>>: is removed
--------------------------------------------------------------------------------
New races found:
* 2: <P>: 1 = erlang:cancel_timer(#Ref<0.0.1.291>)
3: <Timer #Ref<0.0.1.291>>: is removed
################################################################################
Interleaving #2
--------------------------------------------------------------------------------
Errors found:
* Concuerror crashed
--------------------------------------------------------------------------------
Event trace:
1: <P>: #Ref<0.0.1.291> = erlang:start_timer(1000, <P>, foo)
in after_timeout.erl line 32
################################################################################
Exploration completed!
################################################################################
Errors:
--------------------------------------------------------------------------------
* On step 2, replaying a built-in returned a different result than expected:
original:
<0.84.0>: is removed
new:
blocked
Please notify the developers, as this is a bug of Concuerror.
(I tweaked the test case a bit, so that <P>
blocks and therefore I get an event trace for the first interleaving too.)
I would appreciate some help on which part of the codebase to poke around with.
Ah, very sorry for the bad lead.
Another acceptable way to do this is to add an extra case for timer-related primitives, that checks the after_timeout setting and just returns a reference (via the relevant callback), but also emitting a warning similar to the one for 'ignoring after timeouts' for receives.
This case should be placed before this: https://github.com/parapluu/Concuerror/blob/master/src/concuerror_callback.erl#L855
But that would mean cancel_timer
wouldn't find a running timer with that reference and return false
instead of an integer here: https://github.com/parapluu/Concuerror/blob/master/src/concuerror_callback.erl#L822-L823
I think my approach should work too, but I miss some rules about what events race in a timer-specific scenario. The test case with the receive after
construct does exactly the same thing internally: it spawns a new process that gets blocked in a receive
statement. Only that this time the new process is a special timer process, and therefore Concuerror doesn't recognise the causality between stop_timer
and the process terminating.
Eek. Didn't think about that case.
Indeed the assumption has been that one can always reverse cancel_timer bifs and fake timer processes dying. With a quick look I think the relevant line is:
https://github.com/parapluu/Concuerror/blob/master/src/concuerror_dependencies.erl#L332
and the "time order" is irrelevant due to:
https://github.com/parapluu/Concuerror/blob/master/src/concuerror_dependencies.erl#L75
Unfortunately I can't think of an easy solution.
Faking the rest of the timer API so that e.g. 'cancel_timer' would be an exit signal, and 'read_timer' a 'is_process_alive' requires an extra abstraction layer above built-in ops. This would get you correct dependencies "for free".
Another, 'ugly traditional' way to fix such things in the past has been to add more info to events in the extra
field and using it for dependency detection. So you could add the after_timeout to some/all exit events and then fix the offending lines above to only have a dependency with the ones that can really be reversed. But the code in _dependencies has reached the point where this is really painful.