orc
orc copied to clipboard
The interpreter crashes with an assert failure just occasionally
This may be more than one bug, or it may be a single race; at the moment I
cannot tell. The crashes seem to be in the general form of asserts failing due
to tokens being in the wrong state. Often involving halting or killing.
I am creating this issue to collect information on the subject in hopes that we
can eventually find a way to reproduce this bug well enough to fix it. The
comments will contain traces of crashes. If you have something that may be
related feel free to add it.
Version and Platform:
Version 2.1.0 and development there after. This is hard enough to reproduce
that we have not found a pattern. Similar crashes have occurred on both Linux
and OS X.
Original issue reported on code.google.com by arthur.peters
on 28 Sep 2013 at 7:04
==== Starting test_data/performance/threadring.orc ====
498
Sep 28, 2013 12:47:44 PM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Only live tokens may be blocked: state=Killed
at orc.run.core.Token.blockOn(Token.scala:191)
at orc.run.core.Join.join(Join.scala:68)
at orc.run.core.Token.makeCall(Token.scala:440)
at orc.run.core.Token$$anonfun$eval$2.apply(Token.scala:518)
at orc.run.core.Token$$anonfun$eval$2.apply(Token.scala:518)
at orc.run.core.Resolver$$anonfun$resolve$1.apply(Resolver.scala:31)
at orc.run.core.Resolver$$anonfun$resolve$1.apply(Resolver.scala:30)
at orc.run.core.Resolver$class.resolveOptional(Resolver.scala:47)
at orc.run.core.Token.resolveOptional(Token.scala:54)
at orc.run.core.Resolver$class.resolve(Resolver.scala:30)
at orc.run.core.Token.resolve(Token.scala:54)
at orc.run.core.Token.eval(Token.scala:518)
at orc.run.core.Token.run(Token.scala:486)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Error: java.lang.AssertionError: Only live tokens may be blocked: state=Killed
Original comment by arthur.peters
on 28 Sep 2013 at 7:05
I may have had an insight, but I don't know how to test my theory. Many
references to state have the form: state match { ... }. This appears to read
state only once however the pseudo code of what is generated for this is as
follows:
val s = state
if(s matches ...) {
...
} else if(s matches ...) {v
...
}
This seems fine, but if the JIT compiler optimizes this it may change it to the
following because the read is neither synchronized nor volatile (I couldn't
tell for sure if this is allowed, but it's a common way to handle register
spills):
if(state matches ...) {
...
} else if(state matches ...) {v
...
}
This has a race in it because different checks could see different values. If
the checks where reordered this could be a particular issue, for instance if
Killed is checked first then Live. The token might be live when Killed is
checked and then Killed when live is checked and finally fail with an exception
(see code in blockOn). This reordering does not appear the JVM bytecode (I
checked) but the JIT compiler could still be doing it.
This may all be crazy, but I thought I would document it before I move on to
other work.
Original comment by arthur.peters
on 28 Sep 2013 at 7:51
==== Starting ../OrcExamples/beginning_Orc/auction.orc ====
Current bid: 1
Current bid: 2
Current bid: 3
Current bid: 4
Current bid: 5
Current bid: 6
Sep 29, 2013 11:48:03 PM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Spurious check of call handle.
[email protected]=CallReturnedValue(true)
at orc.run.core.CallHandle.check(CallHandle.scala:92)
at orc.run.core.Token.run(Token.scala:469)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Error: java.lang.AssertionError: Spurious check of call handle.
[email protected]=CallReturnedValue(true)
Note that the state shown should not have caused this exception. There is a
race in the message generation, but I think this might be a hint.
branches/PetersOrcMachine/OrcScala/src/orc/run/core/CallHandle.scala @ r3313
Original comment by arthur.peters
on 30 Sep 2013 at 7:54
It just happened again:
Jan 25, 2014 4:25:58 AM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Spurious check of call handle.
[email protected]=CallWasKilled
at orc.run.core.CallHandle.check(CallHandle.scala:92)
at orc.run.core.Token.run(Token.scala:469)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Error: java.lang.AssertionError: Spurious check of call handle.
[email protected]=CallWasKilled
Exception in thread "main" java.lang.AssertionError: Spurious check of call
handle. [email protected]=CallWasKilled
at orc.run.core.CallHandle.check(CallHandle.scala:92)
at orc.run.core.Token.run(Token.scala:469)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Occurred will running Sudoku.orc running under the Token interpreter with -O3
at r3338 in the Porc branch.
I still don't know how to solve this. But I thought I would store this for
future debugging.
I do not see any way this could happen except by having the token scheduled to
run before it should in such a way that there is a race between the token
starting to execute and the killed handle state being set.
Original comment by arthur.peters
on 25 Jan 2014 at 6:21
I think this was fixed incidentally in the meantime. OK to close?
It's possible, but I have seen this error in I would say the last 6 months (I should have captured the trace and added it here, but I didn't). So I'm not sure it's fixed. Just really unlikely.