openj9
openj9 copied to clipboard
CRIU supports Java debugger via the restore option file
CRIU supports Java debugger via the restore option file
Debugger support related code are guarded with isDebugOnRestoreEnabled();
For -Xint mode, do not disable JVMTI capabilities required for JDWP debugger before checkpoint, add a capability can_access_local_variables;
For JIT mode, add JVMTI capabilities required for JDWP debugger before checkpoint;
Hooked J9HOOK_VM_PREPARING_FOR_RESTORE event to create/load the agent library specified in restore option file;
Refactored the agent library creation from J9VMInitArgs;
Added some trace points.
Signed-off-by: Jason Feng [email protected]
Closes https://github.com/eclipse-openj9/openj9/issues/16959
FYI @tajila
The CRIU JDWP debugger via the restore option file requires both extension repo PR https://github.com/ibmruntimes/openj9-openjdk-jdk/pull/591 and its backports (in progress) and this PR. However, this PR alone won't break any existing builds such as Java 8 that won't be backported.
FYI @dsouzai
This PR has the following messages post-restore when no JDWP agent is specified in the restore option file.
JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
JVMJITM043W AOT load and compilation disabled post restore.
which is expected to be resolved via a JIT change like the patch you provided.
Note: this JDWP debugger support via the option file is only available with -XX:+DebugOnRestore.
I opened https://github.com/eclipse-openj9/openj9/pull/20047, which is essentially the patch I had given you.
I was reading over https://github.com/eclipse-openj9/openj9/issues/18866, specifically the VM Coordination section, and wanted to check if Points 1 and 2 are addressed, namely:
- Decompile immediately on restore if a user enables debug post-restore
- If debug is not enabled post-restore, but normal HCR is possible, then the VM needs to ensure that stacks with FSD bodies get decompiled.
Regarding 1, although currently (i.e. with this PR and https://github.com/eclipse-openj9/openj9/pull/20047) we will invalidate all compiled code if debug is enabled post-restore, it is still possible for JIT'd code to execute, namely:
- JNI Thunks
- Failed proactive compilation
For a failed proactive compilation, we will fall back to using the FSD body which means that a thread will continue to execute that body. Therefore, the VM needs to decompile immediately on restore if debug is enabled. Once we have the ability to reuse the FSD code we generate pre-checkpoint this won't be necessary, but for now I believe we need this to make debug fully functional.
In the case of JNI Thunks, I'm working on a PR for that.
I think once we address all of this, debug on restore should be completely functional (if not optimal). FYI @TobiAjila @vijaysun-omr
For a failed proactive compilation, we will fall back to using the FSD body which means that a thread will continue to execute that body. Therefore, the VM needs to decompile immediately on restore if debug is enabled. Once we have the ability to reuse the FSD code we generate pre-checkpoint this won't be necessary, but for now I believe we need this to make debug fully functional.
Hm actually... I guess it's not actually a problem if it's running FSD code because that is already set up to deal with debug events. So I guess I just need to address the JNI Thunk issue; the rest is covered by the method body invalidation that already occurs.
I'll think about this some more, but now I don't believe we need to decompile immediately anymore heh.
In the case of JNI Thunks, I'm working on a PR for that.
Opened https://github.com/eclipse-openj9/openj9/pull/20108.
Rebased to resovle the merge conflict.
@JasonFengJ9 How exatly did you enable debug on restore? whe I tried -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y in the options file it failed with:
JVMJITM043W AOT load and compilation disabled post restore.
post restore hook
JVMJ9VM007E Command-line option unrecognised: -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y
Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: The JVM could not enable all the restore options specified
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:534)
at Debug.checkPointJVM(Debug.java:32)
at Debug.main(Debug.java:16)
Caused by: openj9.internal.criu.JVMRestoreException: The JVM could not enable all the restore options specified
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:997)
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
... 2 more
@tajila I was using -agentlib:jdwp=transport=dt_socket,server=y,suspend=y, will look into -Xrunjdwp option.
same issue
JVMJITM043W AOT load and compilation disabled post restore.
post restore hook
JVMJ9VM007E Command-line option unrecognised: -agentlib:jdwp=transport=dt_socket,server=y,suspend=y
Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: The JVM could not enable all the restore options specified
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:534)
at Debug.checkPointJVM(Debug.java:32)
at Debug.main(Debug.java:16)
Caused by: openj9.internal.criu.JVMRestoreException: The JVM could not enable all the restore options specified
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:997)
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
... 2 more
@JasonFengJ9 We will need to add a docs issue for this as well
-Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y
Added support of -Xrunjdwp.
@tajila this is ready for another look.
jenkins test sanity xlinux jdk17
jenkins test sanity alinux64 jdk21
jenkins test sanity win jdk8
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_x86-64_linux_Personal_testList_0/540/consoleFull
16:40:07 ---SUMMARY OF FAILED TESTS---
16:40:07 cma001
16:40:07 -----------------------------
16:40:07
16:40:07 -----------------------------------
16:40:07 cmdLineTester_jvmtitests_1_FAILED
Looking at this PR test failure.
Fixed the PR test failure due to an earlier break for undecorated JDWP agent processing while multiple agents were present.
In addition, disabled following failure conditions within criu_jitPostRestore.xml:
<!-- output type="failure" caseSensitive="yes" regex="no">Some or all compiled code in the code cache invalidated post restore.</output>
<output type="failure" caseSensitive="yes" regex="no">JIT compilation disabled post restore.</output -->
Verified manually that these failure conditions can re-enabled with
- https://github.com/eclipse-openj9/openj9/pull/20047
Also moved DisposeEnvironment(jvmti_env) into jvmtiHookVMRestoreStartAgent() which is the last part of the cleanup if there was no JDWP agent specified. It releases VM access and hence can't be invoked within criuDisableHooks() from J9HOOK_VM_PREPARING_FOR_RESTORE.
@tajila this is ready for another look.
jenkins test sanity win jdk8
jenkins test sanity alinux64 jdk21
jenkins test sanity xlinux64 jdk17
jenkins test sanity xlinux jdk17