SOMns icon indicating copy to clipboard operation
SOMns copied to clipboard

Race on object transition

Open daumayr opened this issue 7 years ago • 11 comments

To reproduce:

for i in 1 2 3 4 5 6; do ./som -G -at -TF core-lib/Benchmarks/AsyncHarness.som Savina.CigaretteSmokers 50 0  1000:200; done

Originally, it resulted in a endless recursion on transitioning the object in InitializerFieldWrite.updateObject:

Run-time Error: EventualMessage failed with Exception.
java.lang.AssertionError: After transitioning the object to a new shape, we expect the layout to be valid. But, this is racy...
	at som.interpreter.objectstorage.InitializerFieldWrite.updateObject(InitializerFieldWrite.java:248)
	at som.interpreter.objectstorage.InitializerFieldWriteNodeGen$UpdateObjectNode_.execute_(InitializerFieldWriteNodeGen.java:1317)
	at som.interpreter.objectstorage.InitializerFieldWriteNodeGen$BaseNode_.acceptAndExecute(InitializerFieldWriteNodeGen.java:123)
	at com.oracle.truffle.api.dsl.internal.SpecializationNode.uninitialized(SpecializationNode.java:407)
	at som.interpreter.objectstorage.InitializerFieldWriteNodeGen$UninitializedNode_.execute_(InitializerFieldWriteNodeGen.java:379)
	at som.interpreter.objectstorage.InitializerFieldWriteNodeGen$BaseNode_.acceptAndExecute(InitializerFieldWriteNodeGen.java:123)
	at com.oracle.truffle.api.dsl.internal.SpecializationNode.removeThis(SpecializationNode.java:247)
	at som.interpreter.objectstorage.InitializerFieldWriteNodeGen$UnwrittenOrGeneralizingValueNode_.execute_(InitializerFieldWriteNodeGen.java:1280)
	at som.interpreter.objectstorage.InitializerFieldWriteNodeGen$BaseNode_.execute0(InitializerFieldWriteNodeGen.java:135)

There are however other seemingly related assertion failures:

java.lang.AssertionError
	at som.interpreter.objectstorage.StorageLocation$ObjectDirectStorageLocation.isSet(StorageLocation.java:163)
	at som.interpreter.objectstorage.FieldReadNode.createRead(FieldReadNode.java:27)
	at som.compiler.MixinDefinition$SlotDefinition.createNode(MixinDefinition.java:540)
	at som.compiler.MixinDefinition$SlotDefinition.getDispatchNode(MixinDefinition.java:517)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.createSomDispatchNode(UninitializedDispatchNode.java:92)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.insertSpecialization(UninitializedDispatchNode.java:65)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:51)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:151)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:112)
	at som.interpreter.nodes.dispatch.CachedSlotAccessNode$CachedSlotRead.executeDispatch(CachedSlotAccessNode.java:57)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:113)
	at som.interpreter.nodes.dispatch.CachedSlotAccessNode$CachedSlotRead.executeDispatch(CachedSlotAccessNode.java:57)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:113)
	at som.interpreter.nodes.dispatch.CachedSlotAccessNode$CachedSlotRead.executeDispatch(CachedSlotAccessNode.java:57)
	at som.interpreter.nodes.MessageSendNode$GenericMessageSendNode.doPreEvaluated(MessageSendNode.java:319)
	at som.interpreter.nodes.MessageSendNode$AbstractMessageSendNode.executeGeneric(MessageSendNode.java:95)
	at som.interpreter.nodes.nary.EagerBinaryPrimitiveNode.executeGeneric(EagerBinaryPrimitiveNode.java:82)
	at som.interpreter.nodes.InternalObjectArrayNode.executeObjectArray(InternalObjectArrayNode.java:26)

daumayr avatar Jan 11 '17 11:01 daumayr

Can't reproduce with latest master. Might that be something that got broken in a branch?

smarr avatar Jan 11 '17 17:01 smarr

My branches are a few commits behind, i'll see if rebasing changes anything.

daumayr avatar Jan 11 '17 18:01 daumayr

Testcase: testInParallel [SuperSends.testSuperClassClause2A [66]] (som.tests.BasicInterpreterTests): FAILED

[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$UninitializedLexicallyBound.doLookup(UninitializedDispatchNode.java:245)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.createSomDispatchNode(UninitializedDispatchNode.java:82)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.insertSpecialization(UninitializedDispatchNode.java:67)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:53)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:166)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:127)
[junit] 	at som.interpreter.nodes.dispatch.CachedDispatchNode.executeDispatch(CachedDispatchNode.java:46)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:128)
[junit] 	at som.interpreter.nodes.dispatch.CachedDispatchNode.executeDispatch(CachedDispatchNode.java:46)
[junit] 	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:128)
[junit] 	at som.interpreter.nodes.dispatch.CachedDispatchNode.executeDispatch(CachedDispatchNode.java:46)

daumayr avatar Jan 31 '17 15:01 daumayr

While https://github.com/smarr/SOMns/pull/164 changes some things around this issue, I don't think it actually fixes it. But, I still have no clue how it can happen.

smarr avatar Jun 26 '17 08:06 smarr

Another related test failure, I think:

java.lang.StackOverflowError
	at som.interpreter.nodes.dispatch.CachedSlotRead$UnwrittenSlotRead.<init>(CachedSlotRead.java:83)
	at som.interpreter.objectstorage.StorageLocation$UnwrittenStorageLocation.getReadNode(StorageLocation.java:112)
	at som.compiler.MixinDefinition$SlotDefinition.createNode(MixinDefinition.java:568)
	at som.compiler.MixinDefinition$ClassSlotDefinition.createNode(MixinDefinition.java:642)
	at som.compiler.MixinDefinition$SlotDefinition.getDispatchNode(MixinDefinition.java:535)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.createSomDispatchNode(UninitializedDispatchNode.java:101)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.insertSpecialization(UninitializedDispatchNode.java:73)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:59)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:174)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:134)
	at som.interpreter.nodes.dispatch.CachedSlotRead.executeDispatch(CachedSlotRead.java:53)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:134)
	at som.interpreter.nodes.dispatch.CachedSlotRead.executeDispatch(CachedSlotRead.java:53)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:134)
	at som.interpreter.nodes.dispatch.CachedSlotRead.executeDispatch(CachedSlotRead.java:53)

https://travis-ci.org/smarr/SOMns/jobs/289695719

smarr avatar Oct 18 '17 21:10 smarr

Ok, so I might have a clue what this is about.

Currently, we invalidate an object layout, and then construct a new one. At least in our parallel BasicInterpreterTests, this can trigger class loading, which can take enough time for a thread to run out of stack in a loop of always the same invalid object layouts.

For the moment, I'll add a busy loop to try to fix this.

The relevant change: https://github.com/MetaConc/SOMns/commit/250362298e1b7bc42c4c40a37570badfe3814502

smarr avatar Mar 24 '18 12:03 smarr

PR #244 also solve an issue with not handling invalid layouts in the ClassSlotAccessNode. This could cause data been written to the wrong memory address, I think. It is hard to test, but at least I observed an A being read where an R should be read (both are inner classes in the Parser.ns BasicInterpreterTest).

By checking whether the read/write operations are valid, we can avoid the issue. In case they are invalid, we will fall back to the standard accessors on SObject, which do the right thing.

smarr avatar Mar 24 '18 23:03 smarr

There is another issue lurking:

https://travis-ci.org/smarr/SOMns/jobs/379663247#L4988

BenchmarkHarnessTests
Total Number of Tests:      2
Number of Successful Tests: 2
ActorTests
instance of FarReference
Uncaught exception on ActorProcessingThread-0
Processing failed for: Actor
java.lang.AssertionError: null is not a valid value for an object slot, it needs to be initialized with nil.
	at som.interpreter.objectstorage.StorageLocation$ObjectStorageLocation.isSet(StorageLocation.java:210)
	at som.compiler.MixinDefinition$SlotDefinition.getDispatchNode(MixinDefinition.java:544)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.createSomDispatchNode(UninitializedDispatchNode.java:101)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.insertSpecialization(UninitializedDispatchNode.java:73)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:59)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:174)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:134)
	at som.interpreter.nodes.MessageSendNode$GenericMessageSendNode.doPreEvaluated(MessageSendNode.java:313)
	at som.interpreter.nodes.MessageSendNode$AbstractUninitializedMessageSendNode.doPreEvaluated(MessageSendNode.java:177)
	at som.interpreter.nodes.ResolvingImplicitReceiverSend.doPreEvaluated(ResolvingImplicitReceiverSend.java:84)
	at som.interpreter.nodes.MessageSendNode$AbstractMessageSendNode.executeGeneric(MessageSendNode.java:128)
	at som.interpreter.nodes.ResolvingImplicitReceiverSend.executeGeneric(ResolvingImplicitReceiverSend.java:67)
	at som.interpreter.nodes.MessageSendNode$AbstractMessageSendNode.evaluateArguments(MessageSendNode.java:135)
	at som.interpreter.nodes.MessageSendNode$AbstractMessageSendNode.executeGeneric(MessageSendNode.java:127)
	at som.interpreter.nodes.MessageSendNode$AbstractUninitializedMessageSendNode.executeGeneric(MessageSendNode.java:169)
	at som.interpreter.nodes.SequenceNode.executeAllButLast(SequenceNode.java:50)
	at som.interpreter.nodes.SequenceNode.executeGeneric(SequenceNode.java:43)
	at som.interpreter.Invokable.execute(Invokable.java:51)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:245)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:234)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:224)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:209)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:192)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedDirectCallNode.callProxy(OptimizedDirectCallNode.java:81)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedDirectCallNode.call(OptimizedDirectCallNode.java:65)
	at som.interpreter.nodes.dispatch.CachedDispatchNode.executeDispatch(CachedDispatchNode.java:38)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:134)
	at som.interpreter.nodes.MessageSendNode$GenericMessageSendNode.doPreEvaluated(MessageSendNode.java:313)
	at som.interpreter.actors.ReceivedMessage.execute(ReceivedMessage.java:43)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:245)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:234)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:224)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:209)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:185)
	at som.interpreter.actors.EventualMessage.executeMessage(EventualMessage.java:324)
	at som.interpreter.actors.EventualMessage.execute(EventualMessage.java:309)
	at som.interpreter.actors.Actor$ExecAllMessages.execute(Actor.java:306)
	at som.interpreter.actors.Actor$ExecAllMessages.processCurrentMessages(Actor.java:284)
	at som.interpreter.actors.Actor$ExecAllMessages.doRun(Actor.java:266)
	at som.interpreter.actors.Actor$ExecutorRootNode.execute(Actor.java:219)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:245)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:234)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:224)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:209)
	at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:185)
	at som.interpreter.actors.Actor$ExecAllMessages.run(Actor.java:245)
	at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1603)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
Caused by: com.oracle.truffle.api.TruffleStackTrace$LazyStackTrace
Caused by: com.oracle.truffle.api.TruffleStackTrace
	at com.oracle.truffle.truffle_api/com.oracle.truffle.api.TruffleStackTrace.fillIn(TruffleStackTrace.java:170)
	at com.oracle.truffle.truffle_api/com.oracle.truffle.api.TruffleStackTrace.addStackFrameInfo(TruffleStackTrace.java:233)
	at com.oracle.truffle.truffle_api/com.oracle.truffle.api.TruffleLanguage$LanguageImpl.onThrowable(TruffleLanguage.java:1866)
	at com.oracle.truffle.truffle_api/com.oracle.truffle.api.impl.TVMCI.onThrowable(TVMCI.java:178)

smarr avatar May 16 '18 11:05 smarr

And another one: https://travis-ci.org/smarr/SOMns/jobs/423684181#L1239

java.lang.AssertionError: null is not a valid value for an object slot, it needs to be initialized with nil.
	at som.interpreter.objectstorage.StorageLocation$ObjectStorageLocation.isSet(StorageLocation.java:210)
	at som.compiler.MixinDefinition$SlotDefinition.getDispatchNode(MixinDefinition.java:571)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.createSomDispatchNode(UninitializedDispatchNode.java:98)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.insertSpecialization(UninitializedDispatchNode.java:70)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:58)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.specialize(UninitializedDispatchNode.java:171)
	at som.interpreter.nodes.dispatch.UninitializedDispatchNode$AbstractUninitialized.executeDispatch(UninitializedDispatchNode.java:131)
	at som.interpreter.nodes.dispatch.CachedSlotRead.executeDispatch(CachedSlotRead.java:67)
	at som.interpreter.nodes.MessageSendNode$GenericMessageSendNode.doPreEvaluated(MessageSendNode.java:341)
	at som.interpreter.nodes.MessageSendNode$AbstractMessageSendNode.executeGeneric(MessageSendNode.java:141)
	at som.interpreter.nodes.nary.EagerBinaryPrimitiveNode.executeGeneric(EagerBinaryPrimitiveNode.java:85)
	at som.interpreter.nodes.InternalObjectArrayNode.executeObjectArray(InternalObjectArrayNode.java:24)
	at som.interpreter.actors.EventualSendNode.executeGeneric(EventualSendNode.java:72)
	at som.interpreter.Invokable.execute(Invokable.java:51)
	at com.oracle.truffle.api.impl.DefaultCallTarget.callDirectOrIndirect(DefaultCallTarget.java:64)
	at com.oracle.truffle.api.impl.DefaultDirectCallNode.call(DefaultDirectCallNode.java:43)
	at som.interpreter.nodes.dispatch.BlockDispatchNode.activateCachedBlock(BlockDispatchNode.java:45)
	at som.interpreter.nodes.dispatch.BlockDispatchNodeGen.executeDispatch(BlockDispatchNodeGen.java:36)
	at som.primitives.arrays.ArraySetAllStrategy.evalBlockForRemaining(ArraySetAllStrategy.java:24)
	at som.primitives.arrays.ArraySetAllStrategy.evaluateFirstDetermineStorageAndEvaluateRest(ArraySetAllStrategy.java:203)
	at som.primitives.arrays.PutAllNode.doPutEvalBlock(PutAllNode.java:61)
	at som.primitives.arrays.PutAllNodeFactory$PutAllNodeGen.executeAndSpecialize(PutAllNodeFactory.java:286)
	at som.primitives.arrays.PutAllNodeFactory$PutAllNodeGen.executeEvaluated(PutAllNodeFactory.java:125)
	at som.interpreter.nodes.nary.EagerBinaryPrimitiveNode.executeEvaluated(EagerBinaryPrimitiveNode.java:94)
	at som.interpreter.nodes.nary.EagerBinaryPrimitiveNode.executeGeneric(EagerBinaryPrimitiveNode.java:88)
	at som.interpreter.Invokable.execute(Invokable.java:51)

smarr avatar Sep 02 '18 19:09 smarr

The issue is related to the Phaser used in ObjectTransitionSafepoint. The current implementation uses a default Phaser, which Terminates when onAdvance(...) returns true. This causes the synchronisation methods to have no effect. The default onAdvance implementation causes the Phaser to terminate when the number of registered Threads is 0.

From the Javadoc: "A phaser may enter a termination state, that may be checked using method isTerminated(). Upon termination, all synchronization methods immediately return without waiting for advance, as indicated by a negative return value. Similarly, attempts to register upon termination have no effect. Termination is triggered when an invocation of onAdvance returns true."

As a solution we can overwrite the onAdvance method and return false, this will prevent Phaser from terminating and preserve the synchronisation. Due to the missing synchronisation it is likely that deadlocks and other synchronisation issues went unnoticed.

The BigContention benchmark deadlocks when the Phaser works correctly. This happens when two threads attempt to access an uninitialised class slot in an object. The first thread obtains a lock on the object and then enters the safepoint to update the object layout. The second thread attempts to do the same, but the object is already locked and the thread blocks until the object is available. As a result we have the first thread waiting for the second thread to enter the safepoint, and the second thread waits for the first to release the lock on the object. Synchronization on the object is necessary to avoid the risk of exposing different identities for the class represented by the slot.

daumayr avatar Jan 04 '19 15:01 daumayr

Other issues when Phaser works correctly:

Join Primitive in the Vacation Benchmark may cause deadlock. The thread that is Joined with is already waiting at the safepoint, which can't start as the other thread is blocked with the join.

The ForkjoinPools we use tries to compensate for blocked threads, i.e. when the last thread enters the safepoint a new thread is created. This is problematic with tracing as the new threads reduce the number of available buffers. It is possible to reach a situation where there are more threads than buffers, the newest thread blocks when attemting to get a buffer, and all other threads wait at the safepoint for that newest thread.

Related to previous one: The threads also hold a SnapshotBuffer(created on demand). When enough threads are created by the pool trying to compensate, the system runs out of memory.

daumayr avatar Jan 10 '19 16:01 daumayr