jdk
jdk copied to clipboard
8359870: JVM crashes in AccessInternal::PostRuntimeDispatch
ThreadDumper/ThreadSnapshot need to handle a failure to resolve the native VM JavaThread from a java.lang.Thread. This is hard to reproduce but a thread that has since terminated can provoke a crash. Recognise this and return a null ThreadSnapshot.
Progress
- [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8359870: JVM crashes in AccessInternal::PostRuntimeDispatch (Bug - P2)
Reviewers
- Alex Menkov (@alexmenkov - Reviewer)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25958/head:pull/25958
$ git checkout pull/25958
Update a local copy of the PR:
$ git checkout pull/25958
$ git pull https://git.openjdk.org/jdk.git pull/25958/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25958
View PR using the GUI difftool:
$ git pr show -t 25958
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25958.diff
Using Webrev
:wave: Welcome back kevinw! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@kevinjwalls This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8359870: JVM crashes in AccessInternal::PostRuntimeDispatch
Reviewed-by: amenkov, dholmes, sspitsyn
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 20 new commits pushed to the master branch:
- 282ee40a56af46521b94fe6e4c90e78b8f513b29: 8359366: RunThese30M.java EXCEPTION_ACCESS_VIOLATION in JvmtiBreakpoints::clearall_in_class_at_safepoint
- e7a450038a47a76d2e616ebce2a7fa8a51e36ea4: 8359707: Add classfile modification code to RedefineClassHelper
- 38f59f84c98dfd974eec0c05541b2138b149def7: 8358179: Performance regression in Math.cbrt
- ... and 17 more: https://git.openjdk.org/jdk/compare/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.
@kevinjwalls The following labels will be automatically applied to this pull request:
core-libshotspot
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
/label remove core-libs /label add serviceability
@AlanBateman
The core-libs label was successfully removed.
@AlanBateman
The serviceability label was successfully added.
Webrevs
- 08: Full - Incremental (93ad155d)
- 07: Full - Incremental (d56e9d16)
- 06: Full - Incremental (e2043438)
- 05: Full - Incremental (d14f5228)
- 04: Full - Incremental (d8143785)
- 03: Full - Incremental (0dc95941)
- 02: Full - Incremental (089dcf49)
- 01: Full - Incremental (e4a7b546)
- 00: Full (33248d9d)
Something still bugging me about this one. From JBS it looked to me like we were dealing with a virtual thread but your change is for the non-virtual thread. And Alan says something about this only being possible due to a temporary condition. So I'm still unclear exactly what the problem is, or why it appeared. Where does the initial "thread" argument come from in the Java code? Is it the one that has terminated, if so why is there not an
isAlive()check somewhere?And how does this lead to the bad oop?
Yes, I was reproducing with a regular non-virtual thread exiting. We have the the j.l.Thread Object and could for it being TERMINATED earlier in HeapDumper/Snapshot, but leaving it to the last moment avoids a bigger window where it could terminate.
(Maybe there is somewhere this should intersect with ThreadSMR...?)
On the bad oop: I enabled the test to run in debug vm for my own testing, but in one of the earlier release crashes at:
V [libjvm.so+0x47bb10] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<286822ul, G1BarrierSet>, (AccessInternal::BarrierType)3, 286822ul>::oop_access_barrier(oopDesc*, long)+0x0 (accessBackend.hpp:228) V [libjvm.so+0x10e1c1a] vframeStream::vframeStream(oopDesc*, Handle)+0x7a (vframe.cpp:523) V [libjvm.so+0x1068a51] GetThreadSnapshotClosure::do_thread(Thread*)+0x7d1 (threadService.cpp:1319) V [libjvm.so+0x106691d] ThreadSnapshotFactory::get_thread_snapshot(_jobject*, JavaThread*)+0x80d (threadService.cpp:1482) V [libjvm.so+0xae23d5] JVM_CreateThreadSnapshot+0x75 (jvm.cpp:2966) j jdk.internal.vm.ThreadSnapshot.create(Ljava/lang/Thread;)Ljdk/internal/vm/ThreadSnapshot;+0 java.base@25-ea ...
Line number info puts it in the _java_thread == null branch of: threadService.cpp 1317 vframeStream vfst(_java_thread != nullptr 1318 ? vframeStream(_java_thread, false, true, vthread_carrier) 1319 : vframeStream(java_lang_VirtualThread::continuation(_thread_h()))); <---
And it's looking inside the Handle _thread_h() within GetThreadSnapshotClosure which was setup by get_thread_snapshot, and it's a null pointer, as Instructions: =>0x00007ffadc251b10: 8b 14 37 31 c0 85 d2 74 18 89 d0 48 8d 15 1e ee mov edx,DWORD PTR [rdi+rsi*1] and RDI=0x0000000000000000
Line number info puts it in the _java_thread == null branch of: threadService.cpp 1317 vframeStream vfst(_java_thread != nullptr 1318 ? vframeStream(_java_thread, false, true, vthread_carrier) 1319 : vframeStream(java_lang_VirtualThread::continuation(_thread_h()))); <---
And it's looking inside the Handle _thread_h() within GetThreadSnapshotClosure which was setup by get_thread_snapshot, and it's a null pointer,
But _thread_h() has already been used a number of times before we get here and if it were null we should have crashed long ago. ???
I was reproducing this frequently, monitoring with asserts in a fastdebug build and problems started with ThreadSnapshotFactory::get_thread_snapshot() getting a null from JNIHandles::resolve(jthread)
...there are several different crashes in the product build.
But _thread_h() has already been used a number of times before we get here and if it were null we should have crashed long ago. ???
There can be some that don't cause a problem, like: java_lang_VirtualThread::is_instance(_thread_h()); (includes null check) ..and others are not called. Hmm maybe there are some that look like they should have crashed, e.g. 1290 _thread_name = OopHandle(oop_storage(), java_lang_Thread::name(_thread_h())); <-- name does: return java_thread->obj_field(_name_offset);
...I don't see why this didn't fault in the report from the JBS issue I was interpreting here (not my debug build). Reordered or something else happened, or just haven't understood enough. It is much easier to read an assert in get_thread_snapshot than letting it continue and crash in vframestream etc...
But null from JNIHandles::resolve(jthread) is the earliest problem I found.
I'm redoing with the cv_internal_thread_to_JavaThread usage...
A little concerned that ThreadsListHandle::cv_internal_thread_to_JavaThread takes jobject jthread, our ref to a java.lang.Thread, and uses also calls 811 oop thread_oop = JNIHandles::resolve_non_null(jthread);
...which asserts if contains null, but maybe I don't know all the ThreadsListHandle magic.
I had a day yesterday where the problem would not reproduce at all, which made it hard to verify! Will update...
Line number info puts it in the _java_thread == null branch of: threadService.cpp 1317 vframeStream vfst(_java_thread != nullptr 1318 ? vframeStream(_java_thread, false, true, vthread_carrier) 1319 : vframeStream(java_lang_VirtualThread::continuation(_thread_h()))); <--- And it's looking inside the Handle _thread_h() within GetThreadSnapshotClosure which was setup by get_thread_snapshot, and it's a null pointer,
But
_thread_h()has already been used a number of times before we get here and if it were null we should have crashed long ago. ???
I believe null here is not result of _thread_h(), but is returned by java_lang_VirtualThread::continuation(...) because _thread_h is lava.lang.Thread object and not java.lang.VirtualThread.
But null from JNIHandles::resolve(jthread) is the earliest problem I found.
I'm redoing with the cv_internal_thread_to_JavaThread usage...
A little concerned that ThreadsListHandle::cv_internal_thread_to_JavaThread takes jobject jthread, our ref to a java.lang.Thread, and uses also calls 811 oop thread_oop = JNIHandles::resolve_non_null(jthread);
JNIHandles::resolve(jthread) can return null only if jthread == nullptr, this should not be possible
I believe null here is not result of
_thread_h(), but is returned byjava_lang_VirtualThread::continuation(...)because_thread_his lava.lang.Thread object and not java.lang.VirtualThread.
That could only happen if we are dealing with a terminated regular thread - which we should never do here if the TLH is used correctly and we only ever pass live threads to do_thread, or else the null which means "unmounted virtual thread".
⚠️ @kevinjwalls This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).
Thanks for all the feedback and reviews!
/integrate
Going to push as commit 13a3927855da61fe27f3b43e5e4755d0c5ac5a16.
Since your change was applied there have been 20 commits pushed to the master branch:
- 282ee40a56af46521b94fe6e4c90e78b8f513b29: 8359366: RunThese30M.java EXCEPTION_ACCESS_VIOLATION in JvmtiBreakpoints::clearall_in_class_at_safepoint
- e7a450038a47a76d2e616ebce2a7fa8a51e36ea4: 8359707: Add classfile modification code to RedefineClassHelper
- 38f59f84c98dfd974eec0c05541b2138b149def7: 8358179: Performance regression in Math.cbrt
- ... and 17 more: https://git.openjdk.org/jdk/compare/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb...master
Your commit was automatically rebased without conflicts.
@kevinjwalls Pushed as commit 13a3927855da61fe27f3b43e5e4755d0c5ac5a16.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.
/backport :jdk25
@kevinjwalls the backport was successfully created on the branch backport-kevinjwalls-13a39278-jdk25 in my personal fork of openjdk/jdk. To create a pull request with this backport targeting openjdk/jdk:jdk25, just click the following link:
:arrow_right: Create pull request
The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:
Hi all,
This pull request contains a backport of commit 13a39278 from the openjdk/jdk repository.
The commit being backported was authored by Kevin Walls on 1 Jul 2025 and was reviewed by Alex Menkov, David Holmes and Serguei Spitsyn.
Thanks!
If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk:
$ git fetch https://github.com/openjdk-bots/jdk.git backport-kevinjwalls-13a39278-jdk25:backport-kevinjwalls-13a39278-jdk25
$ git checkout backport-kevinjwalls-13a39278-jdk25
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk.git backport-kevinjwalls-13a39278-jdk25