jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8327990: [macosx-aarch64] JFR enters VM without WXWrite

Open reinrich opened this issue 11 months ago • 14 comments

This pr changes JfrJvmtiAgent::retransform_classes() and jfr_set_enabled to switch to WXWrite before transitioning to the vm.

Testing: make test TEST=jdk/jfr/event/runtime/TestClassLoadEvent.java TEST_VM_OPTS=-XX:+AssertWXAtThreadSync make test TEST=compiler/intrinsics/klass/CastNullCheckDroppingsTest.java TEST_VM_OPTS=-XX:+AssertWXAtThreadSync

More tests are pending.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8327990: [macosx-aarch64] JFR enters VM without WXWrite (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18238/head:pull/18238
$ git checkout pull/18238

Update a local copy of the PR:
$ git checkout pull/18238
$ git pull https://git.openjdk.org/jdk.git pull/18238/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18238

View PR using the GUI difftool:
$ git pr show -t 18238

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18238.diff

Webrev

Link to Webrev Comment

reinrich avatar Mar 12 '24 15:03 reinrich

:wave: Welcome back rrich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Mar 12 '24 15:03 bridgekeeper[bot]

@reinrich The following labels will be automatically applied to this pull request:

  • hotspot-jfr
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Mar 12 '24 15:03 openjdk[bot]

/label add hotspot

reinrich avatar Mar 13 '24 09:03 reinrich

@reinrich The hotspot label was successfully added.

openjdk[bot] avatar Mar 13 '24 09:03 openjdk[bot]

Webrevs

mlbridge[bot] avatar Mar 13 '24 09:03 mlbridge[bot]

As I wrote in JBS, shouldn't this be handled by ThreadInVMfromNative?

(I wanted to publish the PR before answering your comment)

This would be reasonable in my opinion. I've hoisted setting WXWrite mode in JfrJvmtiAgent::retransform_classes() because multiple instances of ThreadInVMfromNative are reached. This is likely not even necessary. Still exceptions could be made if there are usages of ThreadInVMfromNative where it is needed.

While I agree I'd prefer to do it as a separate enhancement.

reinrich avatar Mar 13 '24 10:03 reinrich

@MBaesken found 2 more locations in jvmti that need switching to WXWrite

JvmtiExport::get_jvmti_interface GetCarrierThread

Both use ThreadInVMfromNative.

reinrich avatar Mar 13 '24 16:03 reinrich

@reinrich This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8327990: [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync

Reviewed-by: dholmes, stuefe, mdoerr, tholenstein, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

  • d3f3011d56267360d65841da3550eca79cf1575b: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream
  • e5e7cd20eca0e5a5f0811d304a9659961dcf11c0: 8328386: Convert java/awt/FileDialog/FileNameOverrideTest test to main
  • 1b68c731f2461a2f26e6d967dd3f94e217b4d1f4: 8328377: Convert java/awt/Cursor/MultiResolutionCursorTest test to main
  • e0373e01fece310d12859207cc5e233f68b7607f: 8328378: Convert java/awt/FileDialog/FileDialogForDirectories test to main
  • 03c25b15eb73e594d1329be397cd34e206d2682b: 8328367: Convert java/awt/Component/UpdatingBootTime test to main

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch. As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Mar 13 '24 20:03 openjdk[bot]

@MBaesken found 2 more locations in jvmti that need switching to WXWrite

JvmtiExport::get_jvmti_interface GetCarrierThread

Both use ThreadInVMfromNative.

Should we address those 2 more findings in this PR ? Or open a separate JBS issue ?

btw those were the jtreg tests triggering the 2 additional findings / asserts

runtime/Thread/AsyncExceptionOnMonitorEnter.java runtime/Thread/AsyncExceptionTest.java serviceability/jvmti/RedefineClasses/RedefineSharedClassJFR.java runtime/handshake/HandshakeDirectTest.java runtime/handshake/SuspendBlocked.java runtime/jni/terminatedThread/TestTerminatedThread.java runtime/lockStack/TestStackWalk.java serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#default serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#no-vmcontinuations serviceability/jvmti/vthread/GetThreadStateMountedTest/GetThreadStateMountedTest.java serviceability/jvmti/vthread/RawMonitorTest/RawMonitorTest.java serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java#default serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java#xint serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java serviceability/jvmti/vthread/WaitNotifySuspendedVThreadTest/WaitNotifySuspendedVThreadTest.java

MBaesken avatar Mar 14 '24 09:03 MBaesken

I noticed a few more asserts (assert(_wx_state == expected) failed: wrong state) in the jfr area (jdk tier3 jfr tests). E.g.

V  [libjvm.dylib+0x8a5d94]  JavaThread::check_for_valid_safepoint_state()+0x0
V  [libjvm.dylib+0x3e95b4]  ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState, bool)+0x174
V  [libjvm.dylib+0x3e93d0]  ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x70
V  [libjvm.dylib+0x91a578]  JfrRecorderService::emit_leakprofiler_events(long long, bool, bool)+0xcc

and

V  [libjvm.dylib+0x8a5d94]  JavaThread::check_for_valid_safepoint_state()+0x0
V  [libjvm.dylib+0x3e95b4]  ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState, bool)+0x174
V  [libjvm.dylib+0x3e93d0]  ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x70
V  [libjvm.dylib+0x8d7f74]  JfrJavaEventWriter::flush(_jobject*, int, int, JavaThread*)+0xf8
j  jdk.jfr.internal.JVM.flush(Ljdk/jfr/internal/event/EventWriter;II)V+0 jdk.jfr@23-internal
j  jdk.jfr.internal.event.EventWriter.flush(II)V+3 jdk.jfr@23-internal

MBaesken avatar Mar 14 '24 14:03 MBaesken

@MBaesken found 2 more locations in jvmti that need switching to WXWrite

JvmtiExport::get_jvmti_interface
GetCarrierThread

Both use ThreadInVMfromNative.

Should we address those 2 more findings in this PR ? Or open a separate JBS issue ?

I'm leaning towards fixing all locations in this PR even though this will prevent clean backports. Would that be ok?

reinrich avatar Mar 15 '24 08:03 reinrich

@MBaesken found 2 more locations in jvmti that need switching to WXWrite

JvmtiExport::get_jvmti_interface
GetCarrierThread

Both use ThreadInVMfromNative.

Should we address those 2 more findings in this PR ? Or open a separate JBS issue ?

I'm leaning towards fixing all locations in this PR even though this will prevent clean backports. Would that be ok?

I think this is ok.

MBaesken avatar Mar 15 '24 08:03 MBaesken

JfrRecorderService::emit_leakprofiler_events (src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp ) and JfrJavaEventWriter::flush (src/hotspot/share/jfr/writers/jfrJavaEventWriter.cpp) might need adjustment too (see the other findings I posted yesterday).

MBaesken avatar Mar 15 '24 08:03 MBaesken

As I wrote in JBS, shouldn't this be handled by ThreadInVMfromNative?

I agree. This is something I am investigating at the moment. Ideally, AssertWXAtThreadSync would also be true by default.

tobiasholenstein avatar Mar 15 '24 13:03 tobiasholenstein

As I wrote in JBS, shouldn't this be handled by ThreadInVMfromNative?

I agree. This is something I am investigating at the moment. Ideally, AssertWXAtThreadSync would also be true by default.

I've added a bunch more locations we've seen when testing with AssertWXAtThreadSync.

@tobiasholenstein would you think that this PR is actually not needed because you are going to push a better way of handling the WXMode in the near future? How should be handle the issues in released versions (jdk 21, 17, ...)? Will it be possible to backport your work?

reinrich avatar Mar 18 '24 10:03 reinrich

That said we have had a lot of people looking at the overall WX state management logic in the past week or so due to https://bugs.openjdk.org/browse/JDK-8327860. The workaround there requires us to be in EXEC mode

That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?

and there are a number of folk who are questioning why "we" chose WRITE mode as the default with a switch to EXEC, instead of EXEC as the default with a switch to WRITE. But whichever way that goes I think the VM state transitions are the places to enforce that choice.

Hmm. Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades. It's causing problems for us now. Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to? That'd (by definition) reduce the number of transitions to the minimum if we were through.

theRealAph avatar Mar 19 '24 09:03 theRealAph

That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?

@theRealAph that is the mystery we hope will be resolved once we know the nature of the underlying OS bug. Somehow switching to exec mode fixes/works-around the issue. I can imagine a missing conditional to check if the region is MAP_JIT.

Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades.

The original introducers of WXEnable made the decision that the VM should be in WRITE mode unless it needs EXEC. That is the state we are presently trying to achieve with this change. If that original design choice is wrong then ...

Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to?

And I've asked about this every time a missing WXEnable has had to be added. We seem to be generically able to describe what kind of code needs which mode, but we seem to struggle to pin it down. Though that is what https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing.

That'd (by definition) reduce the number of transitions to the minimum if we were through.

Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop. We need to optimise the actual call paths as well as identify specific methods.

But all this discussion suggests to me that this PR is not really worth pursuing at this time - IIUC no actual failures are observed other than those pertaining to AssertWXAtThreadSync and that flag will be gone if we do decide to be more fine-grained about WX management.

dholmes-ora avatar Mar 19 '24 12:03 dholmes-ora

IIUC no actual failures are observed other than those pertaining to AssertWXAtThreadSync

We saw sporadic crashes in our jtreg (maybe also jck?) runs; only later we enabled AssertWXAtThreadSync for more checking.

MBaesken avatar Mar 19 '24 13:03 MBaesken

Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to?

The first part we already do.

I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation.

(Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow).

That's very odd. The example there doesn't even involve MAP_JIT memory, so what does it have to do with WX?

@theRealAph that is the mystery we hope will be resolved once we know the nature of the underlying OS bug. Somehow switching to exec mode fixes/works-around the issue. I can imagine a missing conditional to check if the region is MAP_JIT.

Changing WX at VM state transitions is a form of temporal coupling, a classic design smell that has caused problems for decades.

The original introducers of WXEnable made the decision that the VM should be in WRITE mode unless it needs EXEC. That is the state we are presently trying to achieve with this change. If that original design choice is wrong then ...

Instead, could we tag code that needs one or the other, keep track of the current WX state in thread-local memory, and flip WX only when we know we need to?

And I've asked about this every time a missing WXEnable has had to be added. We seem to be generically able to describe what kind of code needs which mode, but we seem to struggle to pin it down. Though that is what https://bugs.openjdk.org/browse/JDK-8307817 is looking at doing.

That'd (by definition) reduce the number of transitions to the minimum if we were through.

Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop.

Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway.

tstuefe avatar Mar 19 '24 15:03 tstuefe

But all this discussion suggests to me that this PR is not really worth pursuing at this time - IIUC no actual failures are observed other than those pertaining to AssertWXAtThreadSync and that flag will be gone if we do decide to be more fine-grained about WX management.

I see it differently. This PR is just a simple attempt to get clean test runs with AssertWXAtThreadSync (after fixing an actual crash https://bugs.openjdk.org/browse/JDK-8327036). While the violating locations in this PR might be unlikely to produce actual crashes I think it is worthwhile to have clean testing with AssertWXAtThreadSync because this will help prevent regressions that are more likely.

Beyond the trivial fixes of this PR I'm very much in favor of further enhancements as the aforementioned https://bugs.openjdk.org/browse/JDK-8307817. My recommendation would be to remove as much non-constant data from the code cache as possible.

reinrich avatar Mar 19 '24 17:03 reinrich

I see it differently. This PR is just a simple attempt to get clean test runs with AssertWXAtThreadSync (after fixing an actual crash https://bugs.openjdk.org/browse/JDK-8327036). While the violating locations in this PR might be unlikely to produce actual crashes I think it is worthwhile to have clean testing with AssertWXAtThreadSync because this will help prevent regressions that are more likely.

I agree. Fixing the current state with this PR makes sense to me. Changing the logic of W^X will take more time and discussion. So from my point of view this PR is ready and should be integrated. If no-one disagrees I will approve

tobiasholenstein avatar Mar 19 '24 17:03 tobiasholenstein

Mailing list message from dean.long at oracle.com on hotspot-dev:

On 3/19/24 8:20 AM, Thomas Stuefe wrote:

I wonder wheter we could - at least as workaround for if we missed a spot - do wx switching as a reaction to a SIBBUS related to WX violation in code cache. Switch state around, return from signal handler and retry operation.

(Edit: tested it, does not seem to work. I guess when the SIGBUS is triggered in the kernel thread WX state had already been processed somehow).

That makes sense if the WX state is part of the signal context saved and restored by the signal handler.

dl -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240319/1836f7ec/attachment.htm>

mlbridge[bot] avatar Mar 19 '24 20:03 mlbridge[bot]

Not necessarily. It may well remove some transitions from paths that don't need it, but if you move the state change too low down the call chain you could end up transitioning much more often in code that does need it e.g. if a transitioning method is called in a loop.

Not if you do the switching lazily. The first iteration would switch to the needed state; subsequent iterations would not do anything since the state already matches. Unless you interleave writes and execs, but then you would need the state changes anyway.

Exactly. You do the transition when it's needed.

theRealAph avatar Mar 19 '24 20:03 theRealAph

Tests with AssertWXAtThreadSync are clean now. Thanks! /integrate

reinrich avatar Mar 21 '24 14:03 reinrich

Going to push as commit e41bc42deb22615c9b93ee639d04e9ed2bd57f64. Since your change was applied there have been 25 commits pushed to the master branch:

  • 43080173e88c8f53cd54c9096c79f3144007fd97: 8328631: Convert java/awt/InputMethods/InputMethodsTest/InputMethodsTest.java applet test to manual
  • 700d2b91defd421a2818f53830c24f70d11ba4f6: 8328401: Convert java/awt/Frame/InitialMaximizedTest/InitialMaximizedTest.html applet test to automated
  • bb3e84bd1fd8217fcb319de8a0716e44359e3423: 8328592: hprof tests fail with -XX:-CompactStrings
  • ac2f8e5af8c88cd13038b113f82bb7c17a38aa40: 8327994: Update code gen in CallGeneratorHelper
  • c434b79cff33e08e4518e92ddddae996dffefe29: 8327169: serviceability/dcmd/vm/SystemMapTest.java and SystemDumpMapTest.java may fail after JDK-8326586
  • 700679011e5e9191f5170209454d35cc82953362: 8328628: JDK-8328157 incorrectly sets -MT on all compilers in jdk.jpackage
  • 68170ae22233462e8925c75c4737be7f0ba9353d: 8328238: Convert few closed manual applet tests to main
  • 9f5ad43358a4e209b4cd1c91bcc86b997f371548: 8320675: PrinterJob/SecurityDialogTest.java hangs
  • 684678f9e83ed0a76541a31356894d170fd421db: 8328633: s390x: Improve vectorization of Match.sqrt() on floats
  • 93d1700f23d42cb35b6028c5d7d029c035711acf: 8328589: unify os::breakpoint among posix platforms
  • ... and 15 more: https://git.openjdk.org/jdk/compare/eebcc2181fe27f6aa10559233c7c58882a146f56...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Mar 21 '24 14:03 openjdk[bot]

@reinrich Pushed as commit e41bc42deb22615c9b93ee639d04e9ed2bd57f64.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Mar 21 '24 14:03 openjdk[bot]