jdk
jdk copied to clipboard
8327990: [macosx-aarch64] JFR enters VM without WXWrite
This pr changes JfrJvmtiAgent::retransform_classes()
and jfr_set_enabled
to switch to WXWrite
before transitioning to the vm.
Testing: make test TEST=jdk/jfr/event/runtime/TestClassLoadEvent.java TEST_VM_OPTS=-XX:+AssertWXAtThreadSync make test TEST=compiler/intrinsics/klass/CastNullCheckDroppingsTest.java TEST_VM_OPTS=-XX:+AssertWXAtThreadSync
More tests are pending.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8327990: [macosx-aarch64] JFR enters VM without WXWrite (Bug - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18238/head:pull/18238
$ git checkout pull/18238
Update a local copy of the PR:
$ git checkout pull/18238
$ git pull https://git.openjdk.org/jdk.git pull/18238/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18238
View PR using the GUI difftool:
$ git pr show -t 18238
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18238.diff
Webrev
:wave: Welcome back rrich! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@reinrich The following labels will be automatically applied to this pull request:
-
hotspot-jfr
-
serviceability
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
/label add hotspot
@reinrich
The hotspot
label was successfully added.
Webrevs
- 03: Full - Incremental (8239638b)
- 02: Full - Incremental (1b4d7606)
- 01: Full - Incremental (9075f4be)
- 00: Full (b34ad5f3)
As I wrote in JBS, shouldn't this be handled by
ThreadInVMfromNative
?
(I wanted to publish the PR before answering your comment)
This would be reasonable in my opinion.
I've hoisted setting WXWrite
mode in JfrJvmtiAgent::retransform_classes()
because multiple instances of ThreadInVMfromNative
are reached. This is likely not even necessary. Still exceptions could be made if there are usages of ThreadInVMfromNative
where it is needed.
While I agree I'd prefer to do it as a separate enhancement.
@MBaesken found 2 more locations in jvmti that need switching to WXWrite
JvmtiExport::get_jvmti_interface GetCarrierThread
Both use ThreadInVMfromNative
.
@reinrich This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync
Reviewed-by: dholmes, stuefe, mdoerr, tholenstein, aph
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 5 new commits pushed to the master
branch:
- d3f3011d56267360d65841da3550eca79cf1575b: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream
- e5e7cd20eca0e5a5f0811d304a9659961dcf11c0: 8328386: Convert java/awt/FileDialog/FileNameOverrideTest test to main
- 1b68c731f2461a2f26e6d967dd3f94e217b4d1f4: 8328377: Convert java/awt/Cursor/MultiResolutionCursorTest test to main
- e0373e01fece310d12859207cc5e233f68b7607f: 8328378: Convert java/awt/FileDialog/FileDialogForDirectories test to main
- 03c25b15eb73e594d1329be397cd34e206d2682b: 8328367: Convert java/awt/Component/UpdatingBootTime test to main
Please see this link for an up-to-date comparison between the source branch of this pull request and the master
branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master
branch, type /integrate in a new comment.
@MBaesken found 2 more locations in jvmti that need switching to
WXWrite
JvmtiExport::get_jvmti_interface GetCarrierThread
Both use
ThreadInVMfromNative
.
Should we address those 2 more findings in this PR ? Or open a separate JBS issue ?
btw those were the jtreg tests triggering the 2 additional findings / asserts
runtime/Thread/AsyncExceptionOnMonitorEnter.java runtime/Thread/AsyncExceptionTest.java serviceability/jvmti/RedefineClasses/RedefineSharedClassJFR.java runtime/handshake/HandshakeDirectTest.java runtime/handshake/SuspendBlocked.java runtime/jni/terminatedThread/TestTerminatedThread.java runtime/lockStack/TestStackWalk.java serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#default serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#no-vmcontinuations serviceability/jvmti/vthread/GetThreadStateMountedTest/GetThreadStateMountedTest.java serviceability/jvmti/vthread/RawMonitorTest/RawMonitorTest.java serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java#default serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java#xint serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java serviceability/jvmti/vthread/WaitNotifySuspendedVThreadTest/WaitNotifySuspendedVThreadTest.java
I noticed a few more asserts (assert(_wx_state == expected) failed: wrong state) in the jfr area (jdk tier3 jfr tests). E.g.
V [libjvm.dylib+0x8a5d94] JavaThread::check_for_valid_safepoint_state()+0x0
V [libjvm.dylib+0x3e95b4] ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState, bool)+0x174
V [libjvm.dylib+0x3e93d0] ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x70
V [libjvm.dylib+0x91a578] JfrRecorderService::emit_leakprofiler_events(long long, bool, bool)+0xcc
and
V [libjvm.dylib+0x8a5d94] JavaThread::check_for_valid_safepoint_state()+0x0
V [libjvm.dylib+0x3e95b4] ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState, bool)+0x174
V [libjvm.dylib+0x3e93d0] ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x70
V [libjvm.dylib+0x8d7f74] JfrJavaEventWriter::flush(_jobject*, int, int, JavaThread*)+0xf8
j jdk.jfr.internal.JVM.flush(Ljdk/jfr/internal/event/EventWriter;II)V+0 jdk.jfr@23-internal
j jdk.jfr.internal.event.EventWriter.flush(II)V+3 jdk.jfr@23-internal
@MBaesken found 2 more locations in jvmti that need switching to
WXWrite
JvmtiExport::get_jvmti_interface GetCarrierThread
Both use
ThreadInVMfromNative
.Should we address those 2 more findings in this PR ? Or open a separate JBS issue ?
I'm leaning towards fixing all locations in this PR even though this will prevent clean backports. Would that be ok?
@MBaesken found 2 more locations in jvmti that need switching to
WXWrite
JvmtiExport::get_jvmti_interface GetCarrierThread
Both use
ThreadInVMfromNative
.Should we address those 2 more findings in this PR ? Or open a separate JBS issue ?
I'm leaning towards fixing all locations in this PR even though this will prevent clean backports. Would that be ok?
I think this is ok.
JfrRecorderService::emit_leakprofiler_events
(src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp ) and JfrJavaEventWriter::flush
(src/hotspot/share/jfr/writers/jfrJavaEventWriter.cpp) might need adjustment too (see the other findings I posted yesterday).
As I wrote in JBS, shouldn't this be handled by
ThreadInVMfromNative
?
I agree. This is something I am investigating at the moment. Ideally, AssertWXAtThreadSync would also be true by default.
As I wrote in JBS, shouldn't this be handled by
ThreadInVMfromNative
?I agree. This is something I am investigating at the moment. Ideally, AssertWXAtThreadSync would also be true by default.
I've added a bunch more locations we've seen when testing with AssertWXAtThreadSync.
@tobiasholenstein would you think that this PR is actually not needed because you are going to push a better way of handling the WXMode in the near future? How should be handle the issues in released versions (jdk 21, 17, ...)? Will it be possible to backport your work?
That said we have had a lot of people looking at the overall WX state management logic in the past week or so due to https://bugs.openjdk.org/browse/JDK-8327860. The workaround there requires us to be in EXEC mode
That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?
and there are a number of folk who are questioning why "we" chose WRITE mode as the default with a switch to EXEC, instead of EXEC as the default with a switch to WRITE. But whichever way that goes I think the VM state transitions are the places to enforce that choice.
Hmm. Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades. It's causing problems for us now. Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to? That'd (by definition) reduce the number of transitions to the minimum if we were through.
That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?
@theRealAph that is the mystery we hope will be resolved once we know the nature of the underlying OS bug. Somehow switching to exec mode fixes/works-around the issue. I can imagine a missing conditional to check if the region is MAP_JIT.
Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades.
The original introducers of WXEnable made the decision that the VM should be in WRITE mode unless it needs EXEC. That is the state we are presently trying to achieve with this change. If that original design choice is wrong then ...
Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to?
And I've asked about this every time a missing WXEnable has had to be added. We seem to be generically able to describe what kind of code needs which mode, but we seem to struggle to pin it down. Though that is what https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing.
That'd (by definition) reduce the number of transitions to the minimum if we were through.
Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop. We need to optimise the actual call paths as well as identify specific methods.
But all this discussion suggests to me that this PR is not really worth pursuing at this time - IIUC no actual failures are observed other than those pertaining to AssertWXAtThreadSync
and that flag will be gone if we do decide to be more fine-grained about WX management.
IIUC no actual failures are observed other than those pertaining to AssertWXAtThreadSync
We saw sporadic crashes in our jtreg (maybe also jck?) runs; only later we enabled AssertWXAtThreadSync for more checking.
Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to?
The first part we already do.
I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation.
(Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow).
That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?
@theRealAph that is the mystery we hope will be resolved once we know the nature of the underlying OS bug. Somehow switching to exec mode fixes/works-around the issue. I can imagine a missing conditional to check if the region is MAP_JIT.
Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades.
The original introducers of WXEnable made the decision that the VM should be in WRITE mode unless it needs EXEC. That is the state we are presently trying to achieve with this change. If that original design choice is wrong then ...
Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to?
And I've asked about this every time a missing WXEnable has had to be added. We seem to be generically able to describe what kind of code needs which mode, but we seem to struggle to pin it down. Though that is what https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing.
That'd (by definition) reduce the number of transitions to the minimum if we were through.
Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop.
Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway.
But all this discussion suggests to me that this PR is not really worth pursuing at this time - IIUC no actual failures are observed other than those pertaining to AssertWXAtThreadSync and that flag will be gone if we do decide to be more fine-grained about WX management.
I see it differently. This PR is just a simple attempt to get clean test runs with AssertWXAtThreadSync (after fixing an actual crash https://bugs.openjdk.org/browse/JDK-8327036). While the violating locations in this PR might be unlikely to produce actual crashes I think it is worthwhile to have clean testing with AssertWXAtThreadSync because this will help prevent regressions that are more likely.
Beyond the trivial fixes of this PR I'm very much in favor of further enhancements as the aforementioned https://bugs.openjdk.org/browse/JDK-8307817. My recommendation would be to remove as much non-constant data from the code cache as possible.
I see it differently. This PR is just a simple attempt to get clean test runs with AssertWXAtThreadSync (after fixing an actual crash https://bugs.openjdk.org/browse/JDK-8327036). While the violating locations in this PR might be unlikely to produce actual crashes I think it is worthwhile to have clean testing with AssertWXAtThreadSync because this will help prevent regressions that are more likely.
I agree. Fixing the current state with this PR makes sense to me. Changing the logic of W^X will take more time and discussion. So from my point of view this PR is ready and should be integrated. If no-one disagrees I will approve
Mailing list message from dean.long at oracle.com on hotspot-dev:
On 3/19/24 8:20 AM, Thomas Stuefe wrote:
I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation.
(Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow).
That makes sense if the WX state is part of the signal context saved and restored by the signal handler.
dl -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240319/1836f7ec/attachment.htm>
Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop.
Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway.
Exactly. You do the transition when it's needed.
Tests with AssertWXAtThreadSync are clean now. Thanks! /integrate
Going to push as commit e41bc42deb22615c9b93ee639d04e9ed2bd57f64.
Since your change was applied there have been 25 commits pushed to the master
branch:
- 43080173e88c8f53cd54c9096c79f3144007fd97: 8328631: Convert java/awt/InputMethods/InputMethodsTest/InputMethodsTest.java applet test to manual
- 700d2b91defd421a2818f53830c24f70d11ba4f6: 8328401: Convert java/awt/Frame/InitialMaximizedTest/InitialMaximizedTest.html applet test to automated
- bb3e84bd1fd8217fcb319de8a0716e44359e3423: 8328592: hprof tests fail with -XX:-CompactStrings
- ac2f8e5af8c88cd13038b113f82bb7c17a38aa40: 8327994: Update code gen in CallGeneratorHelper
- c434b79cff33e08e4518e92ddddae996dffefe29: 8327169: serviceability/dcmd/vm/SystemMapTest.java and SystemDumpMapTest.java may fail after JDK-8326586
- 700679011e5e9191f5170209454d35cc82953362: 8328628: JDK-8328157 incorrectly sets -MT on all compilers in jdk.jpackage
- 68170ae22233462e8925c75c4737be7f0ba9353d: 8328238: Convert few closed manual applet tests to main
- 9f5ad43358a4e209b4cd1c91bcc86b997f371548: 8320675: PrinterJob/SecurityDialogTest.java hangs
- 684678f9e83ed0a76541a31356894d170fd421db: 8328633: s390x: Improve vectorization of Match.sqrt() on floats
- 93d1700f23d42cb35b6028c5d7d029c035711acf: 8328589: unify os::breakpoint among posix platforms
- ... and 15 more: https://git.openjdk.org/jdk/compare/eebcc2181fe27f6aa10559233c7c58882a146f56...master
Your commit was automatically rebased without conflicts.
@reinrich Pushed as commit e41bc42deb22615c9b93ee639d04e9ed2bd57f64.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.