orc icon indicating copy to clipboard operation
orc copied to clipboard

The interpreter crashes with an assert failure just occasionally

Open GoogleCodeExporter opened this issue 9 years ago • 6 comments

This may be more than one bug, or it may be a single race; at the moment I 
cannot tell. The crashes seem to be in the general form of asserts failing due 
to tokens being in the wrong state. Often involving halting or killing.

I am creating this issue to collect information on the subject in hopes that we 
can eventually find a way to reproduce this bug well enough to fix it. The 
comments will contain traces of crashes. If you have something that may be 
related feel free to add it.

Version and Platform:
Version 2.1.0 and development there after. This is hard enough to reproduce 
that we have not found a pattern. Similar crashes have occurred on both Linux 
and OS X.

Original issue reported on code.google.com by arthur.peters on 28 Sep 2013 at 7:04

GoogleCodeExporter avatar May 07 '15 04:05 GoogleCodeExporter

==== Starting test_data/performance/threadring.orc ====
498
Sep 28, 2013 12:47:44 PM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Only live tokens may be blocked: state=Killed
    at orc.run.core.Token.blockOn(Token.scala:191)
    at orc.run.core.Join.join(Join.scala:68)
    at orc.run.core.Token.makeCall(Token.scala:440)
    at orc.run.core.Token$$anonfun$eval$2.apply(Token.scala:518)
    at orc.run.core.Token$$anonfun$eval$2.apply(Token.scala:518)
    at orc.run.core.Resolver$$anonfun$resolve$1.apply(Resolver.scala:31)
    at orc.run.core.Resolver$$anonfun$resolve$1.apply(Resolver.scala:30)
    at orc.run.core.Resolver$class.resolveOptional(Resolver.scala:47)
    at orc.run.core.Token.resolveOptional(Token.scala:54)
    at orc.run.core.Resolver$class.resolve(Resolver.scala:30)
    at orc.run.core.Token.resolve(Token.scala:54)
    at orc.run.core.Token.eval(Token.scala:518)
    at orc.run.core.Token.run(Token.scala:486)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Error: java.lang.AssertionError: Only live tokens may be blocked: state=Killed

Original comment by arthur.peters on 28 Sep 2013 at 7:05

GoogleCodeExporter avatar May 07 '15 04:05 GoogleCodeExporter

I may have had an insight, but I don't know how to test my theory. Many 
references to state have the form: state match { ... }. This appears to read 
state only once however the pseudo code of what is generated for this is as 
follows:

val s = state
if(s matches ...) {
  ...
} else if(s matches ...) {v
  ...
}

This seems fine, but if the JIT compiler optimizes this it may change it to the 
following because the read is neither synchronized nor volatile (I couldn't 
tell for sure if this is allowed, but it's a common way to handle register 
spills):

if(state matches ...) {
  ...
} else if(state matches ...) {v
  ...
}

This has a race in it because different checks could see different values. If 
the checks where reordered this could be a particular issue, for instance if 
Killed is checked first then Live. The token might be live when Killed is 
checked and then Killed when live is checked and finally fail with an exception 
(see code in blockOn). This reordering does not appear the JVM bytecode (I 
checked) but the JIT compiler could still be doing it.

This may all be crazy, but I thought I would document it before I move on to 
other work.

Original comment by arthur.peters on 28 Sep 2013 at 7:51

GoogleCodeExporter avatar May 07 '15 04:05 GoogleCodeExporter

==== Starting ../OrcExamples/beginning_Orc/auction.orc ====
Current bid: 1
Current bid: 2
Current bid: 3
Current bid: 4
Current bid: 5
Current bid: 6
Sep 29, 2013 11:48:03 PM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Spurious check of call handle. 
[email protected]=CallReturnedValue(true)
    at orc.run.core.CallHandle.check(CallHandle.scala:92)
    at orc.run.core.Token.run(Token.scala:469)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Error: java.lang.AssertionError: Spurious check of call handle. 
[email protected]=CallReturnedValue(true)

Note that the state shown should not have caused this exception. There is a 
race in the message generation, but I think this might be a hint.

branches/PetersOrcMachine/OrcScala/src/orc/run/core/CallHandle.scala @ r3313

Original comment by arthur.peters on 30 Sep 2013 at 7:54

GoogleCodeExporter avatar May 07 '15 04:05 GoogleCodeExporter

It just happened again:

Jan 25, 2014 4:25:58 AM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Spurious check of call handle. 
[email protected]=CallWasKilled
    at orc.run.core.CallHandle.check(CallHandle.scala:92)
    at orc.run.core.Token.run(Token.scala:469)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Error: java.lang.AssertionError: Spurious check of call handle. 
[email protected]=CallWasKilled
Exception in thread "main" java.lang.AssertionError: Spurious check of call 
handle. [email protected]=CallWasKilled
    at orc.run.core.CallHandle.check(CallHandle.scala:92)
    at orc.run.core.Token.run(Token.scala:469)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Occurred will running Sudoku.orc running under the Token interpreter with -O3 
at r3338 in the Porc branch.

I still don't know how to solve this. But I thought I would store this for 
future debugging.

I do not see any way this could happen except by having the token scheduled to 
run before it should in such a way that there is a race between the token 
starting to execute and the killed handle state being set.

Original comment by arthur.peters on 25 Jan 2014 at 6:21

GoogleCodeExporter avatar May 07 '15 04:05 GoogleCodeExporter

I think this was fixed incidentally in the meantime. OK to close?

jthywiss avatar Aug 01 '17 13:08 jthywiss

It's possible, but I have seen this error in I would say the last 6 months (I should have captured the trace and added it here, but I didn't). So I'm not sure it's fixed. Just really unlikely.

arthurp avatar Aug 01 '17 17:08 arthurp