openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

cmdLineTester_GCRegressionTests Unload lots of classes to force finalization ASSERTION FAILED CopyForwardScheme.cpp:2251: ((false))

Open pshipton opened this issue 3 years ago • 19 comments

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_1/248 - rh7-390-4 cmdLineTester_GCRegressionTests_2 -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_1/248/functional_test_output.tar.gz

Testing: Unload lots of classes using FVT stress argument to force finalization (JIT Disabled)
Test start time: 2022/06/15 21:41:04 Eastern Standard Time
Running command: "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/bin/java"  -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced  -Xint  -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 1 milliseconds
***[TEST INFO 2022/06/15 21:56:04] ProcessKiller detected a timeout after 900000 milliseconds!***
INFO: Cannot find '/usr/bin/gdb' using 'gdb' from the path.
***[TEST INFO 2022/06/15 21:56:04] executing gdb -batch -x /tmp/debugger11129749966406046221.txt /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/bin/java 35150***
java.io.IOException: Cannot run program "gdb": error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
	at java.base/java.lang.Runtime.exec(Runtime.java:594)
	at java.base/java.lang.Runtime.exec(Runtime.java:453)
	at Test$ProcessKiller.captureCoreForProcess(Test.java:697)
	at Test$ProcessKiller.captureCoreForProcess(Test.java:646)
	at Test$ProcessKiller.run(Test.java:596)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
	... 6 more
***[TEST INFO 2022/06/15 21:56:04] executing kill -ABRT 35150***
***[TEST INFO 2022/06/15 21:56:04] kill -ABRT signal sent***
***[TEST INFO 2022/06/15 22:01:04] ABRT timed out***
***[TEST INFO 2022/06/15 22:01:04] executing kill -9 35150***
***[TEST INFO 2022/06/15 22:01:04] kill -9 signal sent***
***[TEST INFO 2022/06/15 22:01:04] ProcessKiller destroy test process after timeout***
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 2575
 [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2575 50 -
 [OUT] Allocating array (iteration 1)
 [OUT] i=0
 [OUT] i=1000
 [OUT] i=2000
...
 [OUT] Allocating array (iteration 177)
 [OUT] i=0
 [ERR] 01:41:45.923 0x162700    j9mm.141    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK17_s390x_linux_Nightly/openj9/runtime/gc_vlhgc/CopyForwardScheme.cpp:2251: ((false))
 [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/06/15 21:41:45 - please wait.
Time spent executing: 1200115 milliseconds
Test result: FAILED
Output from test:

@dmitripivkine

pshipton avatar Jun 16 '22 18:06 pshipton

This assertion means Card for srcObject in MM_CopyForwardScheme::updateMarkMapAndCardTableOnCopy() has unexpected state (garbage?). Unfortunately there is no system core stored in artifacts.

dmitripivkine avatar Jun 16 '22 19:06 dmitripivkine

Trying to reproduce in grinder https://openj9-jenkins.osuosl.org/job/Grinder/923/

dmitripivkine avatar Jun 16 '22 19:06 dmitripivkine

another grinder https://openj9-jenkins.osuosl.org/job/Grinder/924/

dmitripivkine avatar Jun 16 '22 20:06 dmitripivkine

I wonder if the absence of diagnostic files means there was some kind of network/disk issue.

pshipton avatar Jun 16 '22 20:06 pshipton

It's weird it didn't even try to write diagnostic files. I updated the description with the complete output at the end.

pshipton avatar Jun 16 '22 20:06 pshipton

Seems it is not easy to reproduce in grinder. So, I guess we are going to wait for next failure and hope system core is stored properly BTW I think there is very first time we see this assertion. I speculate it more looks like random memory corruption than logical error

dmitripivkine avatar Jun 17 '22 14:06 dmitripivkine

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_s390x_linux_Nightly_testList_0/123 - rh7-390-3

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk18_j9_sanity.functional_s390x_linux_Nightly_testList_0/123/functional_test_output.tar.gz

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/08/03 01:23:11 Eastern Standard Time
Running command: "/home/jenkins/workspace/Test_openjdk18_j9_sanity.functional_s390x_linux_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java"  -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /home/jenkins/workspace/Test_openjdk18_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 32328 milliseconds
Test result: FAILED
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 2055
 [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2055 50 -
...
 [OUT] Allocating array (iteration 121)
 [OUT] i=0
 [OUT] i=1000
 [ERR] 05:23:32.866 0x198000    j9mm.141    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK18_s390x_linux_Nightly/openj9/runtime/gc_vlhgc/CopyForwardScheme.cpp:2251: ((false))

pshipton avatar Aug 03 '22 17:08 pshipton

The previous failure has the diagnostic files.

pshipton avatar Aug 03 '22 17:08 pshipton

yes, there is a stored core. Looking. Thank you!

dmitripivkine avatar Aug 03 '22 17:08 dmitripivkine

There are two Card States are not recognized in this switch statement:

#define CARD_REMEMBERED 0x04
#define CARD_REMEMBERED_AND_GMP_SCAN 0x05

Both of them can not be seen in Copy Forward except Copy Abort earlier. And looks like there is the case. MM_CopyForwardScheme::cleanCardTableForPartialCollect() doing fixup by replacing CARD_REMEMBERED to CARD_CLEAN and CARD_REMEMBERED_AND_GMP_SCAN to CARD_GMP_MUST_SCAN. My theory is there was CARD_REMEMBERED card state cleaned by another thread. It caused assertion but it is CARD_CLEAN in the Card Table in the core. I have added enhancement https://github.com/eclipse-openj9/openj9/pull/15660 to print discovered Card State.

dmitripivkine avatar Aug 04 '22 20:08 dmitripivkine

Run grinder https://openj9-jenkins.osuosl.org/job/Grinder/1218/ Unfortunately enhancement https://github.com/eclipse-openj9/openj9/pull/15660 does not help: it does not cache card state before switch, so when it printed it is CARD_CLEAN (0). I will need to improve further and cache card state before switch

 [ERR] Unexpected Card state 0 for Card 000003FF8E334CAD, srcObject 000000000D015A00, dstObject 000000000CF04EA8
 [ERR] 13:22:12.136 0x19f400    j9mm.141    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK18_s390x_linux_Nightly/openj9/runtime/gc_vlhgc/CopyForwardScheme.cpp:2251: ((false))

dmitripivkine avatar Aug 05 '22 18:08 dmitripivkine

If the problem can be reproduced in a grinder, it seems the diagnostic code can be perfected before another PR is merged.

pshipton avatar Aug 05 '22 19:08 pshipton

I prepared https://github.com/eclipse-openj9/openj9/pull/15677 to improve, The change is harmless and trivial

dmitripivkine avatar Aug 05 '22 19:08 dmitripivkine

Sure, but you might as well build a JVM with that change, reproduce the problem in a grinder, and see if it's sufficient or more changes are needed.

pshipton avatar Aug 05 '22 19:08 pshipton

yes, sure, trying https://openj9-jenkins.osuosl.org/job/Grinder/1222/

dmitripivkine avatar Aug 05 '22 19:08 dmitripivkine

Is that grinder correct? It's using a Nightly build. I think you want to first build a new JVM to use that includes https://github.com/eclipse-openj9/openj9/pull/15677, using https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/

pshipton avatar Aug 05 '22 19:08 pshipton

yes, there is possibility to replace OpenJ9 repository. So, I thought it is smart enough to recognize it and rebuild JVM from my repository... Probably not

dmitripivkine avatar Aug 05 '22 19:08 dmitripivkine

A grinder can only run tests, not re-built the JVM. The OpenJ9 repo can be replaced in a grinder because it contains the functionality tests.

pshipton avatar Aug 05 '22 19:08 pshipton

In attempt to get unexpected card state I added volatile to the variable. And the problem has not been seen since. This is latest grinder 350 iterations https://openj9-jenkins.osuosl.org/job/Grinder/1231/ - no failures. And there were a few passed grinders earlier as well

dmitripivkine avatar Aug 08 '22 12:08 dmitripivkine

Code has been improved to get better debugging information https://github.com/eclipse-openj9/openj9/pull/15698

dmitripivkine avatar Aug 23 '22 13:08 dmitripivkine

Grinder with latest Nightly https://openj9-jenkins.osuosl.org/job/Grinder/1255/ - passed

dmitripivkine avatar Aug 23 '22 13:08 dmitripivkine

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/525 cmdLineTester_GCRegressionTests_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/525/functional_test_output.tar.gz

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2023/07/21 05:53:51 Coordinated Universal Time
Running command: "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 5 milliseconds
Time spent executing: 46248 milliseconds
Test result: FAILED
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 1641
 [ERR] JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/07/21 05:53:54 - please wait.
 [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/core.20230721.055354.406477.0001.dmp' in response to an event
 [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/core.20230721.055354.406477.0001.dmp
 [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/core.20230721.055354.406477.0002.dmp' in response to an event
 [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/core.20230721.055354.406477.0002.dmp
 [ERR] JVMDUMP032I JVM requested Heap dump using '/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/heapdump.20230721.055354.406477.0003.phd' in response to an event
 [ERR] JVMDUMP010I Heap dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/heapdump.20230721.055354.406477.0003.phd
 [ERR] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/javacore.20230721.055354.406477.0004.txt' in response to an event
 [ERR] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/javacore.20230721.055354.406477.0004.txt
 [ERR] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/Snap.20230721.055354.406477.0005.trc' in response to an event
 [ERR] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_16899179216340/cmdLineTester_GCRegressionTests_0/Snap.20230721.055354.406477.0005.trc
 [ERR] JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
 [ERR] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 [ERR] 	at java.base/java.lang.ClassLoader.defineClassImpl1(Native Method)
 [ERR] 	at java.base/java.lang.ClassLoader.defineClassInternal(ClassLoader.java:508)
 [ERR] 	at java.base/java.lang.Access.defineClass(Access.java:339)
 [ERR] 	at java.base/java.lang.invoke.MethodHandles$Lookup$ClassDefiner.defineClass(MethodHandles.java:2445)
 [ERR] 	at java.base/java.lang.invoke.InvokerBytecodeGenerator.loadMethod(InvokerBytecodeGenerator.java:320)
 [ERR] 	at java.base/java.lang.invoke.InvokerBytecodeGenerator.generateCustomizedCode(InvokerBytecodeGenerator.java:752)
 [ERR] 	at java.base/java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm.java:854)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.makePreparedLambdaForm(DirectMethodHandle.java:305)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.preparedLambdaForm(DirectMethodHandle.java:233)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.preparedLambdaForm(DirectMethodHandle.java:218)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.preparedLambdaForm(DirectMethodHandle.java:227)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.make(DirectMethodHandle.java:108)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.make(DirectMethodHandle.java:129)
 [ERR] 	at java.base/java.lang.invoke.DirectMethodHandle.make(DirectMethodHandle.java:134)
 [ERR] 	at java.base/java.lang.invoke.Invokers.basicInvoker(Invokers.java:97)
 [ERR] 	at java.base/java.lang.invoke.LambdaForm$NamedFunction.<init>(LambdaForm.java:1105)
 [ERR] 	at java.base/java.lang.invoke.LambdaForm$Name.<init>(LambdaForm.java:1358)
 [ERR] 	at java.base/java.lang.invoke.LambdaFormEditor.makeArgumentCombinationForm(LambdaFormEditor.java:868)
 [ERR] 	at java.base/java.lang.invoke.LambdaFormEditor.filterArgumentsForm(LambdaFormEditor.java:1023)
 [ERR] 	at java.base/java.lang.invoke.MethodHandles.argumentsWithCombiner(MethodHandles.java:6149)
 [ERR] 	at java.base/java.lang.invoke.MethodHandles.filterArgumentsWithCombiner(MethodHandles.java:6114)
 [ERR] 	at java.base/java.lang.invoke.StringConcatFactory.generateMHInlineCopy(StringConcatFactory.java:541)
 [ERR] 	at java.base/java.lang.invoke.StringConcatFactory.makeConcatWithConstants(StringConcatFactory.java:358)
 [ERR] 	at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:144)
 [ERR] 	at java.base/java.lang.invoke.CallSite.makeSite(CallSite.java:315)
 [ERR] 	at java.base/java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:281)
 [ERR] 	at java.base/java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:271)
 [ERR] 	at java.base/java.lang.invoke.MethodHandleResolver.resolveInvokeDynamic(MethodHandleResolver.java:201)
 [ERR] 	at com.ibm.tests.garbagecollector.TestClassLoaderMain.main(TestClassLoaderMain.java:126)
>> Success condition was not found: [Output match: Successful test run!]
>> Success condition was not found: [Output match: Cannot load library required by: -Xjit]

pshipton avatar Jul 21 '23 14:07 pshipton