jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8359870: JVM crashes in AccessInternal::PostRuntimeDispatch

Open kevinjwalls opened this issue 5 months ago • 10 comments

ThreadDumper/ThreadSnapshot need to handle a failure to resolve the native VM JavaThread from a java.lang.Thread. This is hard to reproduce but a thread that has since terminated can provoke a crash. Recognise this and return a null ThreadSnapshot.


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8359870: JVM crashes in AccessInternal::PostRuntimeDispatch (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25958/head:pull/25958
$ git checkout pull/25958

Update a local copy of the PR:
$ git checkout pull/25958
$ git pull https://git.openjdk.org/jdk.git pull/25958/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25958

View PR using the GUI difftool:
$ git pr show -t 25958

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25958.diff

Using Webrev

Link to Webrev Comment

kevinjwalls avatar Jun 24 '25 17:06 kevinjwalls

:wave: Welcome back kevinw! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Jun 24 '25 17:06 bridgekeeper[bot]

@kevinjwalls This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8359870: JVM crashes in AccessInternal::PostRuntimeDispatch

Reviewed-by: amenkov, dholmes, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 20 new commits pushed to the master branch:

  • 282ee40a56af46521b94fe6e4c90e78b8f513b29: 8359366: RunThese30M.java EXCEPTION_ACCESS_VIOLATION in JvmtiBreakpoints::clearall_in_class_at_safepoint
  • e7a450038a47a76d2e616ebce2a7fa8a51e36ea4: 8359707: Add classfile modification code to RedefineClassHelper
  • 38f59f84c98dfd974eec0c05541b2138b149def7: 8358179: Performance regression in Math.cbrt
  • ... and 17 more: https://git.openjdk.org/jdk/compare/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Jun 24 '25 17:06 openjdk[bot]

@kevinjwalls The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Jun 24 '25 17:06 openjdk[bot]

/label remove core-libs /label add serviceability

AlanBateman avatar Jun 24 '25 19:06 AlanBateman

@AlanBateman The core-libs label was successfully removed.

openjdk[bot] avatar Jun 24 '25 19:06 openjdk[bot]

@AlanBateman The serviceability label was successfully added.

openjdk[bot] avatar Jun 24 '25 19:06 openjdk[bot]

Something still bugging me about this one. From JBS it looked to me like we were dealing with a virtual thread but your change is for the non-virtual thread. And Alan says something about this only being possible due to a temporary condition. So I'm still unclear exactly what the problem is, or why it appeared. Where does the initial "thread" argument come from in the Java code? Is it the one that has terminated, if so why is there not an isAlive() check somewhere?

And how does this lead to the bad oop?

Yes, I was reproducing with a regular non-virtual thread exiting. We have the the j.l.Thread Object and could for it being TERMINATED earlier in HeapDumper/Snapshot, but leaving it to the last moment avoids a bigger window where it could terminate.

(Maybe there is somewhere this should intersect with ThreadSMR...?)

On the bad oop: I enabled the test to run in debug vm for my own testing, but in one of the earlier release crashes at:

V [libjvm.so+0x47bb10] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<286822ul, G1BarrierSet>, (AccessInternal::BarrierType)3, 286822ul>::oop_access_barrier(oopDesc*, long)+0x0 (accessBackend.hpp:228) V [libjvm.so+0x10e1c1a] vframeStream::vframeStream(oopDesc*, Handle)+0x7a (vframe.cpp:523) V [libjvm.so+0x1068a51] GetThreadSnapshotClosure::do_thread(Thread*)+0x7d1 (threadService.cpp:1319) V [libjvm.so+0x106691d] ThreadSnapshotFactory::get_thread_snapshot(_jobject*, JavaThread*)+0x80d (threadService.cpp:1482) V [libjvm.so+0xae23d5] JVM_CreateThreadSnapshot+0x75 (jvm.cpp:2966) j jdk.internal.vm.ThreadSnapshot.create(Ljava/lang/Thread;)Ljdk/internal/vm/ThreadSnapshot;+0 java.base@25-ea ...

Line number info puts it in the _java_thread == null branch of: threadService.cpp 1317 vframeStream vfst(_java_thread != nullptr 1318 ? vframeStream(_java_thread, false, true, vthread_carrier) 1319 : vframeStream(java_lang_VirtualThread::continuation(_thread_h()))); <---

And it's looking inside the Handle _thread_h() within GetThreadSnapshotClosure which was setup by get_thread_snapshot, and it's a null pointer, as Instructions: =>0x00007ffadc251b10: 8b 14 37 31 c0 85 d2 74 18 89 d0 48 8d 15 1e ee mov edx,DWORD PTR [rdi+rsi*1] and RDI=0x0000000000000000

kevinjwalls avatar Jun 25 '25 21:06 kevinjwalls

Line number info puts it in the _java_thread == null branch of: threadService.cpp 1317 vframeStream vfst(_java_thread != nullptr 1318 ? vframeStream(_java_thread, false, true, vthread_carrier) 1319 : vframeStream(java_lang_VirtualThread::continuation(_thread_h()))); <---

And it's looking inside the Handle _thread_h() within GetThreadSnapshotClosure which was setup by get_thread_snapshot, and it's a null pointer,

But _thread_h() has already been used a number of times before we get here and if it were null we should have crashed long ago. ???

dholmes-ora avatar Jun 26 '25 00:06 dholmes-ora

I was reproducing this frequently, monitoring with asserts in a fastdebug build and problems started with ThreadSnapshotFactory::get_thread_snapshot() getting a null from JNIHandles::resolve(jthread)

...there are several different crashes in the product build.

But _thread_h() has already been used a number of times before we get here and if it were null we should have crashed long ago. ???

There can be some that don't cause a problem, like: java_lang_VirtualThread::is_instance(_thread_h()); (includes null check) ..and others are not called. Hmm maybe there are some that look like they should have crashed, e.g. 1290 _thread_name = OopHandle(oop_storage(), java_lang_Thread::name(_thread_h())); <-- name does: return java_thread->obj_field(_name_offset);

...I don't see why this didn't fault in the report from the JBS issue I was interpreting here (not my debug build). Reordered or something else happened, or just haven't understood enough. It is much easier to read an assert in get_thread_snapshot than letting it continue and crash in vframestream etc...

But null from JNIHandles::resolve(jthread) is the earliest problem I found.

I'm redoing with the cv_internal_thread_to_JavaThread usage...

A little concerned that ThreadsListHandle::cv_internal_thread_to_JavaThread takes jobject jthread, our ref to a java.lang.Thread, and uses also calls 811 oop thread_oop = JNIHandles::resolve_non_null(jthread);

...which asserts if contains null, but maybe I don't know all the ThreadsListHandle magic.

I had a day yesterday where the problem would not reproduce at all, which made it hard to verify! Will update...

kevinjwalls avatar Jun 27 '25 09:06 kevinjwalls

Line number info puts it in the _java_thread == null branch of: threadService.cpp 1317 vframeStream vfst(_java_thread != nullptr 1318 ? vframeStream(_java_thread, false, true, vthread_carrier) 1319 : vframeStream(java_lang_VirtualThread::continuation(_thread_h()))); <--- And it's looking inside the Handle _thread_h() within GetThreadSnapshotClosure which was setup by get_thread_snapshot, and it's a null pointer,

But _thread_h() has already been used a number of times before we get here and if it were null we should have crashed long ago. ???

I believe null here is not result of _thread_h(), but is returned by java_lang_VirtualThread::continuation(...) because _thread_h is lava.lang.Thread object and not java.lang.VirtualThread.

alexmenkov avatar Jun 27 '25 20:06 alexmenkov

But null from JNIHandles::resolve(jthread) is the earliest problem I found.

I'm redoing with the cv_internal_thread_to_JavaThread usage...

A little concerned that ThreadsListHandle::cv_internal_thread_to_JavaThread takes jobject jthread, our ref to a java.lang.Thread, and uses also calls 811 oop thread_oop = JNIHandles::resolve_non_null(jthread);

JNIHandles::resolve(jthread) can return null only if jthread == nullptr, this should not be possible

alexmenkov avatar Jun 27 '25 20:06 alexmenkov

I believe null here is not result of _thread_h(), but is returned by java_lang_VirtualThread::continuation(...) because _thread_h is lava.lang.Thread object and not java.lang.VirtualThread.

That could only happen if we are dealing with a terminated regular thread - which we should never do here if the TLH is used correctly and we only ever pass live threads to do_thread, or else the null which means "unmounted virtual thread".

dholmes-ora avatar Jun 30 '25 06:06 dholmes-ora

⚠️ @kevinjwalls This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

openjdk[bot] avatar Jul 01 '25 10:07 openjdk[bot]

Thanks for all the feedback and reviews!

kevinjwalls avatar Jul 01 '25 18:07 kevinjwalls

/integrate

kevinjwalls avatar Jul 01 '25 19:07 kevinjwalls

Going to push as commit 13a3927855da61fe27f3b43e5e4755d0c5ac5a16. Since your change was applied there have been 20 commits pushed to the master branch:

  • 282ee40a56af46521b94fe6e4c90e78b8f513b29: 8359366: RunThese30M.java EXCEPTION_ACCESS_VIOLATION in JvmtiBreakpoints::clearall_in_class_at_safepoint
  • e7a450038a47a76d2e616ebce2a7fa8a51e36ea4: 8359707: Add classfile modification code to RedefineClassHelper
  • 38f59f84c98dfd974eec0c05541b2138b149def7: 8358179: Performance regression in Math.cbrt
  • ... and 17 more: https://git.openjdk.org/jdk/compare/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Jul 01 '25 19:07 openjdk[bot]

@kevinjwalls Pushed as commit 13a3927855da61fe27f3b43e5e4755d0c5ac5a16.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Jul 01 '25 19:07 openjdk[bot]

/backport :jdk25

kevinjwalls avatar Jul 02 '25 08:07 kevinjwalls

@kevinjwalls the backport was successfully created on the branch backport-kevinjwalls-13a39278-jdk25 in my personal fork of openjdk/jdk. To create a pull request with this backport targeting openjdk/jdk:jdk25, just click the following link:

:arrow_right: Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit 13a39278 from the openjdk/jdk repository.

The commit being backported was authored by Kevin Walls on 1 Jul 2025 and was reviewed by Alex Menkov, David Holmes and Serguei Spitsyn.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk:

$ git fetch https://github.com/openjdk-bots/jdk.git backport-kevinjwalls-13a39278-jdk25:backport-kevinjwalls-13a39278-jdk25
$ git checkout backport-kevinjwalls-13a39278-jdk25
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk.git backport-kevinjwalls-13a39278-jdk25

openjdk[bot] avatar Jul 02 '25 08:07 openjdk[bot]