openj9
openj9 copied to clipboard
Segmentation error vmState=0x0005ffff in internal special.system tests with JITServer
Appears in the tests
-
ClassLoadingTest_special_5m_18
-
ClassLoadingTest_special_5m_19
-
ClassLoadingTest_special_5m_19
-
ClassLoadingTest_special_5m_24
-
DaaLoadTest_all_special_5m_18
-
DaaLoadTest_all_special_5m_19
-
DaaLoadTest_all_special_5m_24
-
DaaLoadTest_daa1_special_5m_18
-
DaaLoadTest_daa1_special_5m_19
-
DaaLoadTest_daa1_special_5m_24
-
DaaLoadTest_daa2_special_5m_18
-
DaaLoadTest_daa2_special_5m_19
-
DaaLoadTest_daa2_special_5m_23
-
DaaLoadTest_daa2_special_5m_24
-
DaaLoadTest_daa3_special_5m_24
-
LambdaLoadTest_special_J9_5m_18
-
LambdaLoadTest_special_J9_5m_24
-
MathLoadTest_all_special_5m_18
-
MathLoadTest_all_special_5m_19
-
MathLoadTest_all_special_5m_24
-
MathLoadTest_autosimd_special_5m_18
-
MathLoadTest_autosimd_special_5m_19
-
MathLoadTest_bigdecimal_special_5m_18
-
MathLoadTest_bigdecimal_special_5m_19
-
MathLoadTest_bigdecimal_special_5m_24
-
ParallelStreamsLoadTest_special_J9_18
in at least the jobs
-
/job/Test_openjdk11_j9_special.system_x86-64_linux_jit_Personal/411
-
/job/Test_openjdk17_j9_special.system_ppc64le_linux_jit_Personal/318
-
/job/Test_openjdk11_j9_special.system_ppc64le_linux_jit_Personal/401
Console log for one of the ClassLoadingTest_special_5m_18
failures:
DLT stderr Type=Segmentation error vmState=0x0005ffff
DLT stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
DLT stderr Handler1=00007FFFB36DEAD0 Handler2=00007FFFB3608440
DLT stderr R0=00007FFFB1824124 R1=00007FFF8E33A130 R2=00007FFFB25CF000 R3=0000000000000000
DLT stderr R4=00007FFF8E33ACF0 R5=00007FFFAC1C7500 R6=00007FFEC7E06006 R7=0000000000000000
DLT stderr R8=0000000000000000 R9=00007FFF8FA5BAB8 R10=0000000000080005 R11=000000000000003B
DLT stderr R12=00007FFFB417A650 R13=00007FFF8E3468E0 R14=00007FFFB41B0000 R15=00007FFEC7E015C8
DLT stderr R16=0000000000000001 R17=0000000000000001 R18=00007FFF8E33AA88 R19=00007FFEC7E00188
DLT stderr R20=00007FFEC7E012F0 R21=0000000000000000 R22=00007FFEC7E01470 R23=00007FFEC7E01450
DLT stderr R24=00007FFEC7E01430 R25=0000000000000000 R26=00007FFF8E33AAE8 R27=00007FFF8E33ACF0
DLT stderr R28=00007FFFB1CDB1D0 R29=00007FFFB25F4660 R30=00007FFF8CDF5260 R31=00007FFEC7E0C0E0
DLT stderr NIP=00007FFFB1CDB210 MSR=800000000280F033 ORIG_GPR3=00007FFFB1C7BB38 CTR=00007FFFB417A650
DLT stderr LINK=00007FFFB1CDB204 XER=0000000000000000 CCR=0000000024884434 SOFTE=0000000000000001
DLT stderr TRAP=0000000000000300 DAR=0000000000000008 dsisr=0000000040000000 RESULT=0000000000000000
DLT stderr FPR0 ffffffffffffffff (f: 4294967296.000000, d: -nan)
DLT stderr FPR1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR6 00007fff800377c0 (f: 2147710976.000000, d: 6.953250e-310)
DLT stderr FPR7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR9 0000000500000004 (f: 4.000000, d: 1.060998e-313)
DLT stderr FPR10 0000f85400000004 (f: 4.000000, d: 1.348995e-309)
DLT stderr FPR11 0000000b0000000a (f: 10.000000, d: 2.334195e-313)
DLT stderr FPR12 0000ffff00000023 (f: 35.000000, d: 1.390650e-309)
DLT stderr FPR13 0000001100000000 (f: 0.000000, d: 3.607393e-313)
DLT stderr FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
DLT stderr Module=/home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_jit_Personal_testList_4/openjdkbinary/j2sdk-image/lib/default/libj9jit29.so
DLT stderr Module_base_address=00007FFFB1690000
DLT stderr
DLT stderr Method_being_compiled=org/junit/validator/AnnotationsValidator$FieldValidator.getAnnotatablesForTestClass(Lorg/junit/runners/model/TestClass;)Ljava/lang/Iterable;
DLT stderr Target=2_90_20220721_342 (Linux 4.18.0-372.13.1.el8_6.ppc64le)
DLT stderr CPU=ppc64le (32 logical CPUs) (0x1da850000 RAM)
DLT stderr ----------- Stack Backtrace -----------
DLT stderr _ZN19TR_PPCRecompilation21getExistingMethodInfoEP17TR_ResolvedMethod+0x40 (0x00007FFFB1CDB210 [libj9jit29.so+0x64b210])
DLT stderr _ZN19TR_PPCRecompilationC1EPN2TR11CompilationE+0x48 (0x00007FFFB1CDB9D8 [libj9jit29.so+0x64b9d8])
DLT stderr _ZN19TR_PPCRecompilation8allocateEPN2TR11CompilationE+0x4c (0x00007FFFB1CDBA4C [libj9jit29.so+0x64ba4c])
DLT stderr _ZN2J95Power13CodeGenerator25allocateRecompilationInfoEv+0x1c (0x00007FFFB1C887DC [libj9jit29.so+0x5f87dc])
DLT stderr _ZN3OMR11CompilationC1EiP12OMR_VMThreadP11TR_FrontEndP17TR_ResolvedMethodRN2TR12IlGenRequestERNS7_7OptionsERNS7_6RegionEP9TR_MemoryP19TR_OptimizationPlanPNS7_11EnvironmentE+0x974 (0x00007FFFB1D4AF74 [libj9jit29.so+0x6baf74])
DLT stderr _ZN2J911CompilationC2EiP10J9VMThreadP11TR_FrontEndP17TR_ResolvedMethodRN2TR12IlGenRequestERNS7_7OptionsERNS7_6RegionEP9TR_MemoryP19TR_OptimizationPlanP20TR_RelocationRuntimePNS7_11EnvironmentE+0x94 (0x00007FFFB17C5FD4 [libj9jit29.so+0x135fd4])
DLT stderr _ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x95c (0x00007FFFB180373C [libj9jit29.so+0x17373c])
DLT stderr omrsig_protect+0x3f4 (0x00007FFFB3609914 [libj9prt29.so+0x39914])
DLT stderr _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x36c (0x00007FFFB180039C [libj9jit29.so+0x17039c])
DLT stderr _ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x170 (0x00007FFFB1800990 [libj9jit29.so+0x170990])
DLT stderr _ZN2TR24CompilationInfoPerThread14processEntriesEv+0x3fc (0x00007FFFB17FF17C [libj9jit29.so+0x16f17c])
DLT stderr _ZN2TR24CompilationInfoPerThread3runEv+0xa8 (0x00007FFFB17FF7A8 [libj9jit29.so+0x16f7a8])
DLT stderr _Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0xa0 (0x00007FFFB17FF860 [libj9jit29.so+0x16f860])
DLT stderr omrsig_protect+0x3f4 (0x00007FFFB3609914 [libj9prt29.so+0x39914])
DLT stderr _Z21compilationThreadProcPv+0x1a4 (0x00007FFFB17FFDC4 [libj9jit29.so+0x16fdc4])
DLT stderr thread_wrapper+0x190 (0x00007FFFB359CBC0 [libj9thr29.so+0xcbc0])
DLT stderr start_thread+0xf8 (0x00007FFFB4069678 [libpthread-2.28.so+0x9678])
DLT stderr clone+0x74 (0x00007FFFB3F57E58 [libc-2.28.so+0x137e58])
It reproduces with JITServer enabled (23/25 tests) and does not appear to reproduce with JITServer disabled (0/25 tests).
@mpirvu. I'll try to reproduce this locally. From the recent commits I noticed 1a92d685270a85d9ba369ae872dcbeebea0620d3 that touches the JITServer network communication, but I haven't looked at it in detail.
The crash seems to be in the in the Compilation constructor
OMR::Compilation::Compilation(int, OMR_VMThread*, TR_FrontEnd*, TR_ResolvedMethod*, TR::IlGenRequest&, TR::Options&, TR::Region&, TR_Memory*, TR_OptimizationPlan*, TR::Environment*)
which at some point calls J9::Power::CodeGenerator::allocateRecompilationInfo()
and a few frames after TR_PPCRecompilation::getExistingMethodInfo(TR_ResolvedMethod*)
I am able to reproduce this locally.
After bisection it appears that the bad commit is in fact d505324ff4b258f3ddfd49e23a67b3c59917d713.
It's likely a problem with -XX:+JITServerLocalSyncCompiles
Setting -XX:-JITServerLocalSyncCompiles
does seem to solve the issue.
Since you are able to reproduce this locally, could you please generate a verbose log at the client? We may need to further instrument the code if I cannot figure out the problem from the code.
I can generate a client verbose log by setting EXTRA_OPTIONS
, but the segfault does not occur when I do.
That's unfortunate. So it's also time dependent. is it possible that when adding EXTRA_OPTIONS, the JVM no longer runs in client mode?
It appears to run in client mode still:
#JITServer: JITServer version: 1.38.0
#JITServer: JITServer Client Mode. Server address: localhost port: 38400. Connection Timeout 30000ms
#JITServer: Identifier for current client JVM: 12627387422990972750
and compilation requests seem to be fulfilled.
If it's time dependent, would you like me to try to get a log on a perf machine or something with a bit more power?
Maybe targeted instrumentation will work better, but I have to look at the code first
Trying to decipher the code:
After the sync compilation has been performed locally, the application thread that was waiting for the compilation result is notified (in compileOnSeparateThread()
) and executes this piece of code which is supposed to add another compilation to the queue:
#if defined(J9VM_OPT_JITSERVER)
// If a remote sync compilation has been changed to a local sync compilation,
// queue a new remote async compilation entry for this method.
// - The second compilation is queued only when this is the last thread waiting
// for the compilation of the method.
// - If for any reason, the current compilation request is changed to a sync
// compilation from an async compilation, it probably means we can’t do an
// async compilation. Don't queue the second async remote compilation.
if (entry->hasChangedToLocalSyncComp() &&
(entry->_numThreadsWaiting <= 1) &&
!forcedSync &&
(startPC && startPC != oldStartPC))
{
void *currOldStartPC = startPCIfAlreadyCompiled(vmThread, details, startPC);
// If currOldStartPC is NULL, the method body defined by startPC has not been
// recompiled. We will queue the second remote aync compilation here. Otherwise,
// another thread is doing the recompilation and will take care of queueing
// the second compilation.
if (!currOldStartPC)
{
CompilationPriority compPriority = CP_ASYNC_NORMAL;
bool queuedForRemote = false;
bool isAsync = true;
TR_OptimizationPlan *plan = TR_OptimizationPlan::alloc(entry->_origOptLevel);
TR_YesNoMaybe mthInSharedCache = TR_no;
if (plan)
{
TR_MethodToBeCompiled *entryRemoteCompReq = addMethodToBeCompiled(details, currOldStartPC, compPriority, isAsync,
plan, &queuedForRemote, mthInSharedCache);
if (entryRemoteCompReq)
{
entryRemoteCompReq->_async = isAsync;
if (getMethodQueueSize() <= 1 ||
getNumCompThreadsJobless() > 0) // send notification if any thread is sleeping on comp monitor waiting for suitable work
{
getCompilationMonitor()->notifyAll();
}
if (TR::Options::getJITCmdLineOptions()->getVerboseOption(TR_VerboseCompileRequest))
TR_VerboseLog::writeLineLocked(TR_Vlog_CR,"%p Queued a remote async compilation: entry=%p, j9method=%p",
vmThread, entryRemoteCompReq, entryRemoteCompReq->getMethodDetails().getMethod());
}
if (!queuedForRemote)
TR_OptimizationPlan::freeOptimizationPlan(plan);
}
}
}
#endif /* defined(J9VM_OPT_JITSERVER) */
In void *currOldStartPC = startPCIfAlreadyCompiled(vmThread, details, startPC);
startPC
corresponds to the sync compilation that has just finished. We know it's not NULL because of the test before. Assuming this is not a thunk, startPCIfAlreadyCompiled()
should see this non-NULL value, then look at the linkage info to see whether a recompilation has been been attempted, and if not, we return NULL. Then, the main code proceeds to queue a recompilation with:
TR_MethodToBeCompiled *entryRemoteCompReq = addMethodToBeCompiled(details, currOldStartPC, compPriority, isAsync,
plan, &queuedForRemote, mthInSharedCache);
I think there is a bug here, because currOldStartPC
is NULL and we want to pass startPC
, that is the most recent startPC for the method in question.
This piece of code also deserves a second look:
// If this is a remote sync compilation, change it to a local sync compilation.
// After the local compilation is completed successfully, a remote async compilation
// will be scheduled in compileOnSeparateThread().
TR::IlGeneratorMethodDetails & details = entry->getMethodDetails();
if (!doLocalCompilation &&
!entry->_async &&
!entry->isJNINative() &&
entry->_optimizationPlan->getOptLevel() > cold &&
!details.isNewInstanceThunk() &&
!details.isMethodHandleThunk() &&
(TR::Options::getCmdLineOptions()->getOption(TR_EnableJITServerHeuristics) ||
_compInfo.getPersistentInfo()->isLocalSyncCompiles()) &&
!TR::Options::getCmdLineOptions()->getOption(TR_DisableUpgradingColdCompilations) &&
TR::Options::getCmdLineOptions()->allowRecompilation())
{
doLocalCompilation = true;
entry->_origOptLevel = entry->_optimizationPlan->getOptLevel();
entry->_optimizationPlan->setOptLevel(cold);
entry->_optimizationPlan->setOptLevelDowngraded(true);
if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch))
TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Changed the remote sync compilation to a local sync cold compilation: j9method=%p",
entry->getMethodDetails().getMethod());
}
If I read this correctly, synchronous compilations at cold at below do not benefit from this change. Instead they are going to be performed remotely. If this true, I think we need to change the code to perform the compilation locally, without a forced remote retry. Maybe changing preferLocalComp() to return true if isLocalSyncCompiles() && entry->_optimizationPlan->getOptLevel() <= cold && !entry->_async
I think that this may still be happening, but with different tests now. I see various failures on ppc64le only, though perhaps the x86_64 tests haven't finished yet. (EDIT: it's happening on s390x too).
-
cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_3
-
cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_4
-
jit_ra_0
-
cmdLineTester_jvmtitests_hcr_10
-
jit_recognizedMethod_3
-
jit_recognizedMethod_4
scattered across the jobs
-
/job/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal/429
-
/job/Test_openjdk11_j9_extended.functional_ppc64le_linux_jit_Personal/430
-
/job/Test_openjdk8_j9_sanity.functional_ppc64le_linux_jit_Personal/994
-
/job/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal/1001
Example from cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_4
Testing: rc018
Test start time: 2022/08/12 01:57:57 Pacific Standard Time
Running command: "/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xcompressedrefs -Xgcpolicy:gencon -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation -Xdump -agentlib:jvmtitest=test:rc018 -cp "/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/lib/asm-all.jar" com.ibm.jvmti.tests.util.TestRunner
Time spent starting: 12 milliseconds
Time spent executing: 5244 milliseconds
Test result: FAILED
Output from test:
[OUT] *** Testing [1/1]: testReflectRedefineAtSameTime
[OUT] starting reflect worker threads
[OUT] starting to populate java heap
[ERR] Unhandled exception
[ERR] Type=Segmentation error vmState=0x00000000
[ERR] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
[ERR] Handler1=00003FFF99403650 Handler2=00003FFF9932D080
[ERR] R0=00003FFF981E04CC R1=00003FFF999FB810 R2=00003FFF98F5F000 R3=0000000000000000
[ERR] R4=00003FFF999FBD48 R5=0000000000000200 R6=0000000000000007 R7=0000000000000000
[ERR] R8=0000000000000008 R9=0000000000000220 R10=0000000000000002 R11=0000000000000000
[ERR] R12=00003FFF983741A0 R13=00003FFF99A06900 R14=0000000000152E00 R15=0000000000095700
[ERR] R16=00003FFF7A640038 R17=00000000001D8F10 R18=0000000000000000 R19=0000000000000001
[ERR] R20=00003FFF999FB96F R21=0000000000000000 R22=0000000000095700 R23=00003FFF7A4CD6D0
[ERR] R24=0000000000000080 R25=00003FFF7BDC307C R26=00003FFF940BCA60 R27=00003FFF999FBD48
[ERR] R28=0000000000000040 R29=00003FFF940BB8B0 R30=0000000000000000 R31=00003FFF18004450
[ERR] NIP=00003FFF981E0504 MSR=800000010280F033 ORIG_GPR3=C0000000000027E0 CTR=00003FFF983741A0
[ERR] LINK=00003FFF981E04F8 XER=0000000000000000 CCR=0000000024004424 SOFTE=0000000000000001
[ERR] TRAP=0000000000000300 DAR=0000000000000008 dsisr=0000000040000000 RESULT=0000000000000000
[ERR] FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR1 406a400000000000 (f: 0.000000, d: 2.100000e+02)
[ERR] FPR2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR4 3fe3333340000000 (f: 1073741824.000000, d: 6.000000e-01)
[ERR] FPR5 bfdb4b81c0000000 (f: 3221225472.000000, d: -4.264836e-01)
[ERR] FPR6 3fe5555560000000 (f: 1610612736.000000, d: 6.666667e-01)
[ERR] FPR7 3f2ede5680000000 (f: 2147483648.000000, d: 2.355080e-04)
[ERR] FPR8 5c002300095b0008 (f: 156958720.000000, d: 1.466099e+135)
[ERR] FPR9 ffffffffffffffff (f: 4294967296.000000, d: -nan)
[ERR] FPR10 60002300095f0007 (f: 157220864.000000, d: 2.704475e+154)
[ERR] FPR11 0009580023000957 (f: 587204928.000000, d: 1.299409e-308)
[ERR] FPR12 0000000000000001 (f: 1.000000, d: 4.940656e-324)
[ERR] FPR13 4008000000000000 (f: 0.000000, d: 3.000000e+00)
[ERR] FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[ERR] Module=/home/jenkins/workspace/Test_openjdk8_j9_extended.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/jre/lib/ppc64le/default/libj9jit29.so
[ERR] Module_base_address=00003FFF98070000
[ERR] Target=2_90_20220812_1090 (Linux 3.10.0-1160.71.1.el7.ppc64le)
[ERR] CPU=ppc64le (4 logical CPUs) (0xde750000 RAM)
[ERR] ----------- Stack Backtrace -----------
[ERR] protectedBacktrace+0x24 (0x00003FFF993270E4 [libj9prt29.so+0x370e4])
[ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
[ERR] omrintrospect_backtrace_thread_raw+0xe4 (0x00003FFF99327714 [libj9prt29.so+0x37714])
[ERR] protectedIntrospectBacktraceThread+0x24 (0x00003FFF99326D04 [libj9prt29.so+0x36d04])
[ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
[ERR] omrintrospect_backtrace_thread+0x98 (0x00003FFF99326E18 [libj9prt29.so+0x36e18])
[ERR] generateDiagnosticFiles+0x140 (0x00003FFF994034D0 [libj9vm29.so+0x434d0])
[ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
[ERR] structuredSignalHandler+0x1b4 (0x00003FFF99403804 [libj9vm29.so+0x43804])
[ERR] mainSynchSignalHandler+0x260 (0x00003FFF9932D2E0 [libj9prt29.so+0x3d2e0])
[ERR] __kernel_sigtramp_rt64+0x0 (0x00003FFF99CF0478)
[ERR] _ZN2TR15CompilationInfo21addMethodToBeCompiledERNS_24IlGeneratorMethodDetailsEPv19CompilationPrioritybP19TR_OptimizationPlanPb13TR_YesNoMaybe+0x54c (0x00003FFF981E04CC [libj9jit29.so+0x1704cc])
[ERR] _ZN2TR15CompilationInfo23compileOnSeparateThreadEP10J9VMThreadRNS_24IlGeneratorMethodDetailsEPv13TR_YesNoMaybeP23TR_CompilationErrorCodePbP19TR_OptimizationPlan+0xe4c (0x00003FFF981E2DAC [libj9jit29.so+0x172dac])
[ERR] _ZN2TR15CompilationInfo13compileMethodEP10J9VMThreadRNS_24IlGeneratorMethodDetailsEPv13TR_YesNoMaybeP23TR_CompilationErrorCodePbP19TR_OptimizationPlan+0x314 (0x00003FFF981E3714 [libj9jit29.so+0x173714])
[ERR] DLTLogic+0x74c (0x00003FFF98203B0C [libj9jit29.so+0x193b0c])
[ERR] jitMethodSampleInterrupt+0x13c (0x00003FFF9820400C [libj9jit29.so+0x19400c])
[ERR] dispatchAsyncEvents+0x124 (0x00003FFF99405874 [libj9vm29.so+0x45874])
[ERR] javaCheckAsyncMessages+0xf8 (0x00003FFF993D22A8 [libj9vm29.so+0x122a8])
[ERR] bytecodeLoopCompressed+0x1c4 (0x00003FFF9946B694 [libj9vm29.so+0xab694])
[ERR] (0x00003FFF9952A930 [libj9vm29.so+0x16a930])
[ERR] sidecarInvokeReflectMethodImpl+0x3c0 (0x00003FFF993DDE10 [libj9vm29.so+0x1de10])
[ERR] sidecarInvokeReflectMethod+0x38 (0x00003FFF993DF2A8 [libj9vm29.so+0x1f2a8])
[ERR] JVM_InvokeMethod_Impl+0xb4 (0x00003FFF8D17C504 [libjclse29.so+0x7c504])
[ERR] JVM_InvokeMethod+0x40 (0x00003FFF9991F620 [libjvm.so+0x1f620])
[ERR] JVM_InvokeMethod+0x34 (0x00003FFF99A06564 [libjvm.so+0x6564])
[ERR] Java_sun_reflect_NativeMethodAccessorImpl_invoke0+0x24 (0x00003FFF7A26C064 [libjava.so+0xc064])
[ERR] (0x00003FFF7BDFD724 [<unknown>+0x0])
[ERR] runCallInMethod+0x270 (0x00003FFF993DD700 [libj9vm29.so+0x1d700])
[ERR] gpProtectedRunCallInMethod+0x50 (0x00003FFF99408980 [libj9vm29.so+0x48980])
[ERR] signalProtectAndRunGlue+0x28 (0x00003FFF9953A988 [libj9vm29.so+0x17a988])
[ERR] omrsig_protect+0x358 (0x00003FFF9932E488 [libj9prt29.so+0x3e488])
[ERR] gpProtectAndRun+0xac (0x00003FFF9953AA5C [libj9vm29.so+0x17aa5c])
[ERR] gpCheckCallin+0xc4 (0x00003FFF9940B234 [libj9vm29.so+0x4b234])
[ERR] callStaticVoidMethod+0x48 (0x00003FFF99407F98 [libj9vm29.so+0x47f98])
[ERR] JavaMain+0xff4 (0x00003FFF99C69C94 [libjli.so+0x9c94])
[ERR] start_thread+0x104 (0x00003FFF99CA8CD4 [libpthread.so.0+0x8cd4])
[ERR] clone+0xe4 (0x00003FFF99B67F14 [libc.so.6+0x127f14])
I'll grind these to get a sense of how frequent the segfault is now.
The goal is find a test that fails consistently so that we can debug faster
I can reproduce cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_3
readily now. The program is still crashing at the same place as before:
4964 TR_PersistentMethodInfo* methodInfo = bodyInfo->getMethodInfo(); //TR::Recompilation::getMethodInfoFromPC(pc);
and bodyInfo
is NULL
.
I would like to know what kind of compilation was the one before this one.
TR::Recompilation::getJittedBodyInfoFromPC(pc);
has the following definition:
TR_PersistentJittedBodyInfo *J9::Recompilation::getJittedBodyInfoFromPC(void *startPC)
{
// The body info pointer is stored in the pre-prologue of the method. The
// location of the field depends upon the type of the method header. Use the
// header-type bits in the linkage info fields to determine what kind of header
// this is.
//
J9::PrivateLinkage::LinkageInfo *linkageInfo = J9::PrivateLinkage::LinkageInfo::get(startPC);
return linkageInfo->isRecompMethodBody() ?
*(TR_PersistentJittedBodyInfo **)((uint8_t*)startPC + START_PC_TO_METHOD_INFO_ADDRESS) :
NULL;
}
LinkageInfo
is a word at some offset before startPC
. We ask linkageInfo if this is a body that can be recompiled and if so, we return the jittedBodyInfo which sits at some offset before startPC
.
Example:
0x000007FFFDD8E730 ffffffe8 [0x000000006081AAE0] 66 66 90 66 90 nop (5 bytes) ; Alignment (boundary=8, margin=19)
0x000007FFFDD8E735 ffffffed [0x000000006081AB50] 48 3b dw 0xcccc ; DWImm2 Bytes that will be copied over jitEntryPoint if we fail recompilation
0x000007FFFDD8E737 ffffffef [0x000000006081AC30] e8 84 16 1e 00 call __samplingRecompileMethod; CALLImm4 (0x00000000618DE830); CALLImm4
0x000007FFFDD8E73C fffffff4 [0x000000006081ACB0] b0 ff 59 62 00 00 00 00 dq 0x000000006259ffb0 ; DQImm64 PersistentJittedBodyInfo
0x000007FFFDD8E744 fffffffc [0x000000006081AD20] 15 00 05 00 dd 0x00050015 ; DDImm4 LinkageInfo This 5 means the offset between interpreter entry point and jit entry point.
0x000007FFFDD8E748 00000000 [0x000000006095B2D0] 48 8b 44 24 08 mov rax, qword ptr [rsp+0x8] ; L8RegMem, SymRef #521[0000000000000000]+8 startPC
0x000007FFFDD8E74D 00000005 [0x000000006095B348] nop (0 bytes) ; Align patchable code @32 [0x0:2]
0x000007FFFDD8E74D 00000005 [0x000000006081ADC8] ProcEntry ; PROCENTRY jitEntryPoint
0x000007FFFDD8E74D 00000005 [0x000000006095B548] 48 3b 65 50 cmp rsp, qword ptr [rbp+0x50] ; CMP8RegMem, SymRef #522[0000000000000000]+80
0x000007FFFDD8E751 00000009 [0x000000006095B670] Label L0209: ; LABEL ; (Start of internal control flow)
0x000007FFFDD8E751 00000009 [0x000000006095B758] 0f 86 40 02 00 00 jbe Snippet Label L0208 ; JBE4 ; (Helper Call Snippet with scratch-reg argument)
0x000007FFFDD8E757 0000000f [0x000000006095B8D0] Label L0210: ; LABEL ; (End of internal control flow)
0x000007FFFDD8E757 0000000f [0x000000006095B988] 48 83 ec 08 sub rsp, 0x00000008 ; SUB8RegImms
For methods that cannot be recompiled, we do not have a persistentJittedBodyInfo, so we return 0. This is why I would like to know what type of compilation was the synchronous one that is now getting recompiled. I know that scorching compilations do not get recompiled.
bool
J9::Recompilation::couldBeCompiledAgain()
{
return
self()->shouldBeCompiledAgain() ||
_compilation->usesPreexistence() ||
_compilation->getOption(TR_EnableHCR);
}
bool
J9::Recompilation::shouldBeCompiledAgain()
{
return TR::Options::canJITCompile() && !_doNotCompileAgain;
}
_doNotCompileAgain
is set in the constructor of Recompilation
(when the compiler object is created)
_doNotCompileAgain(comp->getOption(TR_NoRecompile) || !comp->allowRecompilation())
So, we have to makes sure that TR_NoRecompile is not set and that allowRecompilation() is set.
We do test TR::Options::getCmdLineOptions()->allowRecompilation()
when enabling the localSyncCompiles feature.
The TR_NoRecompile
option is set only through command line options, which these tests don't do.
Recompilation object has the following constructor on x86
bool
J9::Compilation::isRecompilationEnabled()
{
if (!self()->cg()->getSupportsRecompilation())
{
return false;
}
if (self()->isDLT())
{
return false;
}
// Don't do recompilation on JNI virtual thunk methods
//
if (self()->getCurrentMethod()->isJNINative())
return false;
return self()->allowRecompilation();
}
So we need to avoid DLT compilations, JNI (which we do).
Sampling is disabled when TR_MimicInterpreterFrameShape
is enabled.
J9::Recompilation::Recompilation(TR::Compilation *comp) :
OMR::RecompilationConnector(comp),
_firstCompile(comp->getCurrentMethod()->isInterpreted()),
_useSampling(TR::Options::getSamplingFrequency() != 0 && !comp->getOption(TR_MimicInterpreterFrameShape)),
_doNotCompileAgain(comp->getOption(TR_NoRecompile) || !comp->allowRecompilation()),
_methodInfo(0),
_bodyInfo(0),
_nextLevel(warm),
_nextCounter(0)
{
_timer.initialize(0, comp->trMemory());
}
Since TR_MimicInterpreterFrameShape
is set at runtime, we need a runtime check that TR_MimicInterpreterFrameShape
is not set when localSyncCompiles is used.