openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

Segmentation fault in JDK 8, loading non-matching libjava shared library

Open kriegaex opened this issue 3 years ago • 1 comments

@keithc-ca asked me to create a follow-up for https://github.com/adoptium/adoptium-support/issues/350 in order to get the issue fixed here. It has been open for more than a year there. Please refer to the full history there.

It seems as if https://github.com/eclipse-openj9/openj9/pull/3650 needs to be ported back to JDK 8 according to https://github.com/adoptium/adoptium-support/issues/350#issuecomment-1147472211.

kriegaex avatar Sep 16 '22 12:09 kriegaex

@tajila fyi

pshipton avatar Sep 19 '22 16:09 pshipton

@pshipton & @tajila - I'm looking to close the Adoptium support issue on the other side. Will this item be worked on before the next Jan PSU ?

karianna avatar Dec 16 '22 02:12 karianna

Will this item be worked on before the next Jan PSU ?

@karianna No.

pshipton avatar Dec 16 '22 13:12 pshipton

@dnakamura Please take a look at this

tajila avatar Jan 20 '23 15:01 tajila

This does not appear to be an issue with java.dll. I am able to replicate the issue locally. From my initial investigation it is pulling in the wrong copy of verify.dll. Still investigating where in the code the dll is being opened

dnakamura avatar Feb 02 '23 22:02 dnakamura

The verify.dll is getting pulled in as a dependency of libjava. Currently working on a patch to use the LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR flag to tweak search order when searching for dependencies, however it does seem to cause unexpected breakages in other areas

dnakamura avatar Feb 21 '23 19:02 dnakamura

I doubt libverify/verify.dll is doing anything for OpenJ9, which has it's own verifier. It seems to be used from Class and ClassLoader natives which OpenJ9 doesn't use. Regardless of what else is required to fix the dependency problem, we should get rid of it.

pshipton avatar Feb 21 '23 20:02 pshipton

Should be resolved by eclipse/omr#6931

dnakamura avatar Apr 13 '23 19:04 dnakamura

I'll go ahead and close it then. The change will be in the 3Q quarter release (possibly called 0.40). If someone wants to try a nighty build, see https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-JDK8/ Select a build, select the Windows build job, there is a link to the binary near the top, such as https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK8_x86-64_windows_Nightly/530/OpenJ9-JDK8-x86-64_windows-20230412-192824.tar.gz

pshipton avatar Apr 13 '23 19:04 pshipton

Taking https://github.com/adoptium/adoptium-support/issues/350 into account, this issue has been open since August 2021. I would appreciate you reopening it, providing me with a download link (thanks, I saw it above) to a version containing the fix, so I can re-test locally, as soon as I am near the machine hosting the project where the issue occurred again.

Keeping me waiting for 1.5 years - undoubtedly because the development team was busy with higher priority work, which is unfortunate but acceptable - and then closing the issue immediately after someone commented that it should be fixed, without waiting for user feedback, does not seem to be a wise course of action to me.

kriegaex avatar Apr 14 '23 11:04 kriegaex

I'm confused, I did provide a download link to a build containing the fix. If you find it's not fixed I will reopen.

pshipton avatar Apr 14 '23 12:04 pshipton

providing me with a download link (thanks, I saw it above)

Sorry for the late edit, which intersected with your comment. I meant to describe a workflow I think is more appropriate than the one chosen. The download link is one part, waiting for user feedback for a while (a few days or weeks) is the other one. Of course, you can close the issue and then reopen, but where I come from, we close issues after user fedback, and only close without the feedback, if none arrives in due time.

I was astounded at the obvious discrepancy between total cycle time for this issue and the trigger-happy-ish way to close it in the same hour as someone (who is not me) commented on the alleged "done" status of it.

kriegaex avatar Apr 14 '23 12:04 kriegaex

We think it's fixed so it's done until there is feedback. I will reopen it now and wait for your feedback before closing again.

pshipton avatar Apr 14 '23 12:04 pshipton

OK, I could remotely log into the machine which originally had the problem, installed the latest OpenJ9 release there and was able to reproduce the problem:

$ while true; do ./java.exe -version; ./java.exe -X; sleep 1; done

(...)

openjdk version "1.8.0_332"
IBM Semeru Runtime Open Edition (build 1.8.0_332-b09)
Eclipse OpenJ9 VM (build openj9-0.32.0, JRE 1.8.0 Windows 10 amd64-64-Bit Compressed References 20220422_375 (JIT enabled, AOT enabled)
OpenJ9   - 9a84ec34e
OMR      - ab24b6666
JCL      - 0b8b8af39a based on jdk8u332-b09)
Unhandled exception
Type=Segmentation error vmState=0x00000000
Windows_ExceptionCode=c0000139 J9Generic_Signal=00000004 ExceptionAddress=00007FFBF7D92856 ContextFlags=0010004f
Handler1=00007FFBA9F7A650 Handler2=00007FFBCA0E93D0
RDI=00000134F1251898 RSI=00007FFBF7DAD390 RAX=00007FFBF7CACB51 RBX=0000000000000001
RCX=00000134EEB30C1C RDX=000000271D35BBA9 R8=0000000000000000 R9=000000271D49E000
R10=00007FFBF7CAD23B R11=0000000000000001 R12=0000000000000003 R13=0000000000000003
R14=00007FFBCA0BB901 R15=0000000000000003
RIP=00007FFBF7D92856 RSP=000000271D35C2B0 RBP=000000271D35C950 EFLAGS=F7CBB86B00000202
FS=0053 ES=002B DS=002B
XMM0 000000271d35be40 (f: 490061376.000000, d: 8.299996e-313)
XMM1 00007ffbf7d09350 (f: 4157641472.000000, d: 6.952500e-310)
XMM2 000000271d35bd50 (f: 490061120.000000, d: 8.299996e-313)
XMM3 000000271d49e000 (f: 491380736.000000, d: 8.300061e-313)
XMM4 0000000000000001 (f: 1.000000, d: 4.940656e-324)
XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+000)
Module=C:\WINDOWS\SYSTEM32\ntdll.dll
Module_base_address=00007FFBF7C90000 Offset_in_DLL=0000000000102856
Target=2_90_20220422_375 (Windows 10 10.0 build 19045)
CPU=amd64 (4 logical CPUs) (0x3f6a91000 RAM)
----------- Stack Backtrace -----------

For some reason, downloading the nightly build takes ages, but as soon as I have it, I will try to reproduce the problem with it, and if I fail to, we can close the issue. Stay tuned.

kriegaex avatar Apr 14 '23 12:04 kriegaex

Good news: I let my script run on this version for about 30 minutes non-stop and could not reproduce the error:

openjdk version "1.8.0_372-internal"
OpenJDK Runtime Environment (build 1.8.0_372-internal-jenkins_2023_04_12_19_28-b00)
Eclipse OpenJ9 VM (build master-4a2643c74af, JRE 1.8.0 Windows 10 amd64-64-Bit Compressed References 20230412_530 (JIT enabled, AOT enabled)
OpenJ9   - 4a2643c74af
OMR      - 77d1fa7b53d
JCL      - 5173eddf8d0 based on jdk8u372-b06)

Thanks to everyone involved for resolving this long-standing issue and also for reopening it, giving me a chance to provide feedback before you considered it "done-done".

kriegaex avatar Apr 14 '23 13:04 kriegaex

Awesome. Closed or not, we accept feedback and hopefully do the right thing, whether that be reopening when necessary or creating a new issue.

pshipton avatar Apr 14 '23 13:04 pshipton

I have no doubt that you would not have re-opened the issue, if I had failed to re-test successfully. But like most people, I tend to stop tracking closed issues, read e-mails etc. IMO however, before closing any user-raised issue, we should try to gather user feedback first. Resolved for you != resolved for me. I remember that I had a hard time convincing people that the issue was real at all, first for Java 11, then for Java 8, because it is not super easy to reproduce.

kriegaex avatar Apr 14 '23 14:04 kriegaex