openj9
openj9 copied to clipboard
Segfault running test_VirtualthreadYieldResume on Power
Failing test is test_VirtualthreadYieldResume under Jep425Tests_testVirtualThread.
Failure link
Test was run on a local machine. I might be able to get a shareable link to the failure in the future if necessary.
Failure output (captured from console output)
The crash output looks like this:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=000071EFC2B73890 Handler2=000071EFC28FCC60
R0=0000000080AD5EF1 R1=000071EF7339A7D0 R2=000071EFC2D86E00 R3=00000004056AF788
R4=000000000044E1F8 R5=0000000000000008 R6=000000000000000E R7=000071EF7339B7B0
R8=000000000044E228 R9=000000000000000E R10=000071EFC256ED50 R11=000000000044DE30
R12=0000000044224484 R13=000071EF733A68E0 R14=000000000044E1F0 R15=000000000044E700
R16=000071EF901D0038 R17=FFFFFFFFFFFFFFFF R18=000071EF7339A9B0 R19=0000000000000002
R20=0000000000000020 R21=000071EFBC90E240 R22=000071EF78002038 R23=000071EFBC90DDD0
R24=000071EF7339A9D0 R25=0000000000000000 R26=000071EF78002038 R27=000071EFC31F7200
R28=000071EF7339AA80 R29=0000000022224222 R30=000071EFC2AF6AA8 R31=000071EFC2B27E00
NIP=0000000000000000 MSR=800000014280F033 ORIG_GPR3=000071EF91B30098 CTR=000071EF91B3007C
LINK=0000000000000000 XER=0000000000000000 CCR=0000000048224422 SOFTE=0000000000000001
TRAP=0000000000000400 DAR=0000000000000000 dsisr=0000000040000000 RESULT=0000000000000000
FPR0 0000000000000001 (f: 1.000000, d: 4.940656e-324)
FPR1 4053a00140000000 (f: 1073741824.000000, d: 7.850008e+01)
FPR2 41e0000000000000 (f: 0.000000, d: 2.147484e+09)
FPR3 0000000000000009 (f: 9.000000, d: 4.446591e-323)
FPR4 000071efc2ebd5b0 (f: 3270235648.000000, d: 6.189387e-310)
FPR5 000071efc2ebd4b0 (f: 3270235392.000000, d: 6.189387e-310)
FPR6 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
FPR7 bfd01eae7f513a67 (f: 2136029824.000000, d: -2.518727e-01)
FPR8 bfdffffef20a4123 (f: 4060758272.000000, d: -4.999997e-01)
FPR9 bfd00ea348b88334 (f: 1220051712.000000, d: -2.508934e-01)
FPR10 bfdfd3e469fa19f5 (f: 1777998336.000000, d: -4.973079e-01)
FPR11 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
FPR12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR13 000071efc2d86e00 (f: 3268963840.000000, d: 6.189387e-310)
FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Target=2_90_20220726_000000 (Linux 5.4.0-122-generic)
CPU=ppc64le (128 logical CPUs) (0xfecba0000 RAM)
----------- Stack Backtrace -----------
(0x0000000000000000 [<unknown>+0x0])
runStaticMethod+0x38c (0x000071EFC2B49A2C [libj9vm29.so+0x19a2c])
enterContinuation+0x1b0 (0x000071EFC2BE77A0 [libj9vm29.so+0xb77a0])
bytecodeLoopCompressed+0x9698 (0x000071EFC2BF1208 [libj9vm29.so+0xc1208])
(0x000071EFC2CA0D68 [libj9vm29.so+0x170d68])
runJavaThread+0x270 (0x000071EFC2B495B0 [libj9vm29.so+0x195b0])
javaProtectedThreadProc+0xf0 (0x000071EFC2BE67B0 [libj9vm29.so+0xb67b0])
omrsig_protect+0x358 (0x000071EFC28FE068 [libj9prt29.so+0x3e068])
javaThreadProc+0x64 (0x000071EFC2BE1824 [libj9vm29.so+0xb1824])
thread_wrapper+0x1a8 (0x000071EFC2B019C8 [libj9thr29.so+0x119c8])
start_thread+0x10c (0x000071EFC2F0885C [libpthread.so.0+0x885c])
clone+0x98 (0x000071EFC3128C98 [libc.so.6+0x158c98])
---------------------------------------
Test passes under Xint:
--enable-preview -Xgcpolicy:nogc -Xint
But, the Test fails under the default options where the JIT is enabled:
--enable-preview -Xgcpolicy:nogc
I narrowed things down so only one small method gets compiled at noOpt and it still fails:
--enable-preview -Xgcpolicy:nogc -Xjit:"count=5,optlevel=noopt,disableAsyncCompilation,limit={*MethodHandle.type*},tracefull,log=/root/hostdir/openj9-openjdk-jdk19/openj9/test/TKG/trace/type.log"
Under these options, only the method java/lang/invoke/MethodHandle.type()Ljava/lang/invoke/MethodType; gets JIT'd and most optimizations are disabled. It is also a very small method where the final trees look like this:
<trees
title="Pre Instruction Selection Trees"
method="java/lang/invoke/MethodHandle.type()Ljava/lang/invoke/MethodType;"
hotness="no-opt">
Pre Instruction Selection Trees: for java/lang/invoke/MethodHandle.type()Ljava/lang/invoke/MethodType;
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
n1n BBStart <block_2> [ 0x71ef90b34510] bci=[-1,0,468] rc=0 vc=17 vn=- li=2 udi=- nc=1
n15n GlRegDeps [ 0x71ef90b34970] bci=[-1,0,468] rc=1 vc=17 vn=- li=- udi=- nc=1
n3n aRegLoad gr3 this<'this' parm Ljava/lang/invoke/MethodHandle;>[#383 Parm] [flags 0xc0000107 0x0 ] (X!=0 X>=0 sharedMemory ) [ 0x71ef90b345b0] bci=[-1,0,468] rc=2 vc=17
vn=- li=2 udi=- nc=0 flg=0x104
n13n treetop [ 0x71ef90b348d0] bci=[-1,1,468] rc=0 vc=17 vn=- li=- udi=- nc=1
n10n iu2l [ 0x71ef90b347e0] bci=[-1,1,468] rc=2 vc=17 vn=- li=2 udi=- nc=1
n9n iloadi java/lang/invoke/MethodHandle.type Ljava/lang/invoke/MethodType;[#384 final Shadow +4] [flags 0x400a0607 0x0 ] [ 0x71ef90b34790] bci=[-1,1,468] rc=1 vc=17 vn=- li
=2 udi=- nc=1
n3n ==>aRegLoad
n7n areturn [ 0x71ef90b346f0] bci=[-1,4,468] rc=0 vc=17 vn=- li=2 udi=- nc=1
n4n l2a [ 0x71ef90b34600] bci=[-1,1,468] rc=1 vc=17 vn=- li=2 udi=- nc=1
n11n lshl (compressionSequence ) [ 0x71ef90b34830] bci=[-1,1,468] rc=1 vc=17 vn=- li=2 udi=- nc=2 flg=0x800
n10n ==>iu2l
n8n iconst 3 [ 0x71ef90b34740] bci=[-1,1,468] rc=1 vc=17 vn=- li=2 udi=- nc=0
n2n BBEnd </block_2> [ 0x71ef90b34560] bci=[-1,4,468] rc=0 vc=17 vn=- li=2 udi=- nc=0
index: node global index
bci=[x,y,z]: byte-code-info [callee-index, bytecode-index, line-number]
rc: reference count
vc: visit count
vn: value number
li: local index
udi: use/def index
nc: number of children
addr: address size in bytes
flg: node flags
Number of nodes = 11, symRefCount = 385
</trees>
If there's any other info that might be useful, feel free to ask.
@fengxue-IS please take a look
The test is updated to run with -Xint atm: https://github.com/eclipse-openj9/openj9/pull/15684
Confirmed that this is a platform specific issue on power, running this in grinder results in 20/20 passes for axz linux, 20/20 failure in plinux/ppc_aix using -Xint
FAILED: test_VirtualthreadYieldResume
java.lang.AssertionError: Virtual Thread 0: incorrect result of 1(2)
will need to investigate further to know what exactly is missing here
That's unexpected. Under -Xint the test passed for me on Linux on Power. I didn't try AIX. I forget exactly how many times I tried it, but it was quite a few.
The test failure I got is using build which included #15678, which used the new design of direct transition. I am not sure why the test only fails on power, I will try to add some debug output to the test and see.
after poking a bit with the test code, the failures come from exception thrown during the sleep call.
java.lang.IllegalStateException: Continuation is pinned: MONITOR
at java.base/jdk.internal.vm.Continuation.yield(Continuation.java:211)
at java.base/java.lang.VirtualThread.yieldContinuation(VirtualThread.java:370)
at java.base/java.lang.VirtualThread.parkNanos(VirtualThread.java:532)
at java.base/java.lang.VirtualThread.doSleepNanos(VirtualThread.java:713)
at java.base/java.lang.VirtualThread.sleepNanos(VirtualThread.java:686)
at java.base/java.lang.Thread.sleep(Thread.java:549)
at org.openj9.test.jep425.VirtualThreadTests.lambda$test_VirtualthreadYieldResume$1(VirtualThreadTests.java:80)
at java.base/java.util.concurrent.ThreadPerTaskExecutor$ThreadBoundFuture.run(ThreadPerTaskExecutor.java:352)
at java.base/java.lang.VirtualThread.run(VirtualThread.java:287)
at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:174)
at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:156)
@babsingh do you see any reason for the Continuation to be pinned in the test code?
@babsingh do you see any reason for the Continuation to be pinned in the test code?
Paths, where ownedMonitorCount is incremented and decremented, may have an inconsistency. PPC has global lock reservation enabled. We need to track the increments and decrements for ownedMonitorCount to determine the source of the failure. Since the failure goes away with -Xint, I presume the failure arises from the JIT paths.
@babsingh do you see any reason for the Continuation to be pinned in the test code?
Paths, where
ownedMonitorCountis incremented and decremented, may have an inconsistency. PPC has global lock reservation enabled. We need to track the increments and decrements forownedMonitorCountto determine the source of the failure. Since the failure goes away with-Xint, I presume the failure arises from the JIT paths.
These failure come from -Xint with #15678 change set, so it is not from JIT code, will have to look at where the Monitor is incremented/decremented, I will add some debug trace to the monitor count during yield failure, I am wondering if we may missed locations where it should be +/-ed.
The -Xint failure is fixed with https://github.com/eclipse-openj9/openj9/pull/15824
@IBMJimmyk can you try this again with the latest changes
Okay, I'll try it out.
With the fix from https://github.com/eclipse-openj9/openj9/pull/15824, I no longer see the same failure on Z. While re-running the test, I am seeing [1] getting failed on Z with following segmentation fault with using the JIT option -Xjit:count=10,disableAsyncCompilation . It passes with -Xint.
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=000003FFBB5C4B58 Handler2=000003FFBB2345C0 InaccessibleAddress=000003FF61A2D000
gpr0=0000000000000000 gpr1=0000000000000000 gpr2=000000000181B700 gpr3=000000000181B798
gpr4=0000000000000003 gpr5=0000000000000040 gpr6=000003FFBB626100 gpr7=0000000000000000
gpr8=000003FF61A2DE7E gpr9=0000000000000001 gpr10=000003FF61A2DEA6 gpr11=000000000181B700
gpr12=000000000E60FFE0 gpr13=000003FFBB3FF900 gpr14=000003FFBB622D96 gpr15=000003FFBB3FE8B0
psw=000003FFBB59BA6E mask=0705000180000000 fpc=00080000 bea=000003FFBB59B9C8
fpr0 0000000000000001 (f: 1.000000, d: 4.940656e-324)
fpr1 000003ffbb3fbbe0 (f: 3141516288.000000, d: 2.172354e-311)
fpr2 000003ffb4008604 (f: 3019933184.000000, d: 2.172294e-311)
fpr3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr4 402c9bc27aade236 (f: 2058215936.000000, d: 1.430422e+01)
fpr5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr6 3ea5bc5489e23399 (f: 2313303040.000000, d: 6.477733e-07)
fpr7 0000000000000004 (f: 4.000000, d: 1.976263e-323)
fpr8 0000000000041000 (f: 266240.000000, d: 1.315400e-318)
fpr9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr10 000003ffbb3bf000 (f: 3141267456.000000, d: 2.172354e-311)
fpr11 000003ffd3afe730 (f: 3551520512.000000, d: 2.172556e-311)
fpr12 000003ffb41fd330 (f: 3021984512.000000, d: 2.172295e-311)
fpr13 000003ffd3afd700 (f: 3551516416.000000, d: 2.172556e-311)
fpr14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/root/LoomJITPinning/openj9-openjdk-jdk19/build/linux-s390x-server-release/images/jdk/lib/default/libj9vm29.so
Module_base_address=000003FFBB580000
Target=2_90_20220906_000000 (Linux 5.4.0-125-generic)
CPU=s390x (4 logical CPUs) (0x1f6164000 RAM)
----------- Stack Backtrace -----------
cleanUpAttachedThread+0x18e (0x000003FFBB59BA6E [libj9vm29.so+0x1ba6e])
threadCleanup+0x136 (0x000003FFBB622D96 [libj9vm29.so+0xa2d96])
javaProtectedThreadProc+0xea (0x000003FFBB6261EA [libj9vm29.so+0xa61ea])
omrsig_protect+0x300 (0x000003FFBB2351C8 [libj9prt29.so+0x351c8])
javaThreadProc+0x5c (0x000003FFBB622C54 [libj9vm29.so+0xa2c54])
thread_wrapper+0x102 (0x000003FFBB485A6A [libj9thr29.so+0x5a6a])
start_thread+0xd6 (0x000003FFBB987E66 [libpthread.so.0+0x7e66])
(0x000003FFBBBFCBF6 [libc.so.6+0xfcbf6])
(0x0000000000000000 [<unknown>+0x0])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/09/07 16:12:13 - please wait.
JVMDUMP032I JVM requested System dump using '/root/LoomJITPinning/openj9_test/test/TKG/output_1662517356527/core.20220907.161213.419383.0001.dmp' in response to an event
[1]. https://github.com/eclipse-openj9/openj9/blob/8186943144f98dccaf4ba48699dc923fe81b4e3c/test/functional/Java19andUp/src/org/openj9/test/jep425/VirtualThreadTests.java#L71-L92
I reran the test on Power including the fix in https://github.com/eclipse-openj9/openj9/pull/15824
The old failure is gone and the test in general passes under -Xint.
But, I still get a failure when -Xint is removed. My error message looks like this:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FC493D34470 Handler2=00007FC493ABCEE0
R0=00007FC4934243A0 R1=00007FC464ADB690 R2=00007FC493F46E00 R3=00000000000E4100
R4=00000000000E4130 R5=0000000000000000 R6=000000000000000E R7=00007FC464ADB7B0
R8=0000000000436D88 R9=000000000000000E R10=00007FC49373FA00 R11=0000000000000002
R12=0000000000000000 R13=00007FC464AE68E0 R14=0000000000436CB8 R15=0000000000226300
R16=0000000000000000 R17=0000000000000000 R18=0000000000000000 R19=0000000000000000
R20=0000000000000000 R21=0000000000000000 R22=0000000000000000 R23=0000000000000000
R24=0000000000000000 R25=0000000000000000 R26=0000000000000000 R27=0000000000000000
R28=0000000000000000 R29=0000000000000000 R30=0000000000000000 R31=0000000000000000
NIP=00007FC466DE3AF8 MSR=800000010280F033 ORIG_GPR3=00007FC493E6788C CTR=00007FC466DE3A80
LINK=00007FC4934243A0 XER=0000000020000000 CCR=0000000048004482 SOFTE=0000000000000001
TRAP=0000000000000300 DAR=0000000000000898 dsisr=0000000040000000 RESULT=0000000000000000
FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR1 4053c8f740000000 (f: 1073741824.000000, d: 7.914009e+01)
FPR2 41e0000000000000 (f: 0.000000, d: 2.147484e+09)
FPR3 3fee666660000000 (f: 1610612736.000000, d: 9.500000e-01)
FPR4 3fce840b4ac4e4d2 (f: 1254417664.000000, d: 2.384047e-01)
FPR5 bfe7154748bef6c8 (f: 1220474624.000000, d: -7.213475e-01)
FPR6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR7 bfc1aa2bc79c8100 (f: 3348922624.000000, d: -1.380057e-01)
FPR8 0065006b0072006f (f: 7471215.000000, d: 9.346037e-307)
FPR9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR11 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
FPR12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR13 0000000000092c00 (f: 601088.000000, d: 2.969769e-318)
FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FC493D34470 Handler2=00007FC493ABCEE0
R0=00007FC4934243A0 R1=00007FC46481B690 R2=00007FC493F46E00 R3=00000000000E4100
R4=00000000000E4130 R5=0000000000000000 R6=000000000000000E R7=00007FC46481B7B0
R8=0000000000443278 R9=000000000000000E R10=00007FC49373FA00 R11=0000000000000002
R12=0000000000000000 R13=00007FC4648268E0 R14=00000000004431A8 R15=000000000044AE00
R16=0000000000000000 R17=0000000000000000 R18=0000000000000000 R19=0000000000000000
R20=0000000000000000 R21=0000000000000000 R22=0000000000000000 R23=0000000000000000
R24=0000000000000000 R25=0000000000000000 R26=0000000000000000 R27=0000000000000000
R28=0000000000000000 R29=0000000000000000 R30=0000000000000000 R31=0000000000000000
NIP=00007FC466DE3AF8 MSR=800000010280F033 ORIG_GPR3=00007FC493E6788C CTR=00007FC466DE3A80
LINK=00007FC4934243A0 XER=0000000000000000 CCR=0000000048004282 SOFTE=0000000000000001
TRAP=0000000000000300 DAR=0000000000000898 dsisr=0000000040000000 RESULT=0000000000000000
Compiled_method=java/lang/Thread.currentCarrierThread()Ljava/lang/Thread;
FPR0 000000000044aa78 (f: 4500088.000000, d: 2.223339e-317)
FPR1 4053c821a0000000 (f: 2684354560.000000, d: 7.912705e+01)
FPR2 41e0000000000000 (f: 0.000000, d: 2.147484e+09)
FPR3 3fee666660000000 (f: 1610612736.000000, d: 9.500000e-01)
FPR4 3fce840b4ac4e4d2 (f: 1254417664.000000, d: 2.384047e-01)
FPR5 bfe7154748bef6c8 (f: 1220474624.000000, d: -7.213475e-01)
FPR6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR7 bfc1aa2bc79c8100 (f: 3348922624.000000, d: -1.380057e-01)
FPR8 0065006b0072006f (f: 7471215.000000, d: 9.346037e-307)
FPR9 37f6803900000000 (f: 0.000000, d: 4.132757e-39)
FPR10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR11 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
FPR12 4000000000000000 (f: 0.000000, d: 2.000000e+00)
FPR13 0000000000092c00 (f: 601088.000000, d: 2.969769e-318)
FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
It also happens inside test_VirtualthreadYieldResume. When looking at a tracefile for java/lang/Thread.currentCarrierThread, it does seem to create a JNI frame but I also see the code I added to increment and decrement the counter. So at the moment, it is unclear what the problem is.
@IBMJimmyk if you disable the compilation of that method does the issue still occur?
I tried using the Xjit option -Xjit:"exclude={*java/lang/Thread.currentCarrierThread*}" to exclude the compilation but the test still fails. This time it fails inside java/lang/Object.<init>()V. The init function does not make a JNI call nor does it lock any monitors. The test also fails if init is the only compiled method. The cause of failure seems to be r16 being 0x0 when the method is called. The compiled method does not modify r16 but does try to dereference r16 plus an offset which causes the crash.
If init is excluded, the failure moves to jdk/internal/vm/Continuation.execute(Ljdk/internal/vm/Continuation;)V. This has the same issue. It tries to dereference from r16 plus an offset and crashes since r16 is 0x0.
So from what I can tell, the PseudoTOC register, r16, seems to be getting trashed somehow outside of JIT'd code.
@gacholio Do you have any thoughts on why the value in r16 does not seem to be set correctly before calling out to JIT'd code?
The r16 value is initialized here: https://github.com/eclipse-openj9/openj9/blob/b9b733900ef25ba25435e8dd557a10f4e39a4fd6/runtime/vm/pcinterp.m4#L120-L121 and is loaded here before transitioning to compiled code: https://github.com/eclipse-openj9/openj9/blob/b9b733900ef25ba25435e8dd557a10f4e39a4fd6/runtime/vm/pcinterp.m4#L133 https://github.com/eclipse-openj9/openj9/blob/b9b733900ef25ba25435e8dd557a10f4e39a4fd6/runtime/oti/phelpers.m4#L851-L869 This relies on the JIT not corrupting the r16 value before transitioning to the interpreter. I have no insight on how that might occur.
I tried modifying openj9/runtime/vm/pcinterp.m4 to crash if there is an attempt to store 0x0 to JIT_GPR_SAVE_SLOT(16). Basically it compares r3 to 0x0 and crashes if it is 0x0. However, my additional code never triggered. So it doesn't seem like this point in the code ever stores 0x0 to JIT_GPR_SAVE_SLOT(16).
In the same build, I modified openj9/runtime/oti/phelpers.m4 to crash if it reads 0x0 when restoring r16. I added code to check r16 against 0x0 after being loaded and crash if it is 0x0. This code does trigger. So it looks like something is writing 0x0 to JIT_GPR_SAVE_SLOT(16) and it is not coming from the indicated location in openj9/runtime/vm/pcinterp.m4.
I also tried a different build where if 0x0 is detected after loading r16, it gets replaces with 0x1234. I confirm this value makes it into the compiled JIT'd method and crashes when a dereference is attempted (would have dereferenced 0x0 without my change). So I can confirm that this load inside openj9/runtime/oti/phelpers.m4 is the real location that the bad 0x0 value is loaded from.
I'm trying to figure out where 0x0 is being stored to JIT_GPR_SAVE_SLOT(16) but haven't found it yet.
r16 is written by SAVE_PRESERVED_REGS which is used when transitioning to the interpreter from compiled code. I suggest putting a check there, or more simply in j2iInvokeExact, j2iTransition and j2iVirtual in pnathlep.m4.
To confirm, you are referring to this location?: https://github.com/IBMJimmyk/openj9/blob/pinningSupport/runtime/oti/phelpers.m4#L828
I tried adding a check for 0 there as well and it never triggers. I didn't try explicitly adding checks to j2iInvokeExact, j2iTransition and j2iVirtual. But, it looks like they just use SAVE_PRESERVED_REGS.
Just in case, I also tried adding the check to SAVE_C_NONVOLATILE_REGS since it also writes to JIT_GPR_SAVE_SLOT(16):
https://github.com/IBMJimmyk/openj9/blob/pinningSupport/runtime/oti/phelpers.m4#L782
But, it doesn't crash there either so that isn't the place the writes 0x0.
Right.
If you know the callin (C interpreter) stack frame in which the failure occurs, you could use a memory write breakpoint in gdb to see where the slot is written.
I can try doing that.