openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

CRIU supports Java debugger via the restore option file

Open JasonFengJ9 opened this issue 1 year ago • 6 comments
trafficstars

CRIU supports Java debugger via the restore option file

Debugger support related code are guarded with isDebugOnRestoreEnabled(); For -Xint mode, do not disable JVMTI capabilities required for JDWP debugger before checkpoint, add a capability can_access_local_variables; For JIT mode, add JVMTI capabilities required for JDWP debugger before checkpoint; Hooked J9HOOK_VM_PREPARING_FOR_RESTORE event to create/load the agent library specified in restore option file; Refactored the agent library creation from J9VMInitArgs; Added some trace points.

Signed-off-by: Jason Feng [email protected]

Closes https://github.com/eclipse-openj9/openj9/issues/16959

JasonFengJ9 avatar Jun 24 '24 22:06 JasonFengJ9

FYI @tajila

JasonFengJ9 avatar Jun 25 '24 13:06 JasonFengJ9

The CRIU JDWP debugger via the restore option file requires both extension repo PR https://github.com/ibmruntimes/openj9-openjdk-jdk/pull/591 and its backports (in progress) and this PR. However, this PR alone won't break any existing builds such as Java 8 that won't be backported.

JasonFengJ9 avatar Aug 22 '24 18:08 JasonFengJ9

FYI @dsouzai This PR has the following messages post-restore when no JDWP agent is specified in the restore option file.

JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
JVMJITM043W AOT load and compilation disabled post restore.

which is expected to be resolved via a JIT change like the patch you provided.

Note: this JDWP debugger support via the option file is only available with -XX:+DebugOnRestore.

JasonFengJ9 avatar Aug 22 '24 18:08 JasonFengJ9

I opened https://github.com/eclipse-openj9/openj9/pull/20047, which is essentially the patch I had given you.

dsouzai avatar Aug 22 '24 21:08 dsouzai

I was reading over https://github.com/eclipse-openj9/openj9/issues/18866, specifically the VM Coordination section, and wanted to check if Points 1 and 2 are addressed, namely:

  1. Decompile immediately on restore if a user enables debug post-restore
  2. If debug is not enabled post-restore, but normal HCR is possible, then the VM needs to ensure that stacks with FSD bodies get decompiled.

Regarding 1, although currently (i.e. with this PR and https://github.com/eclipse-openj9/openj9/pull/20047) we will invalidate all compiled code if debug is enabled post-restore, it is still possible for JIT'd code to execute, namely:

  • JNI Thunks
  • Failed proactive compilation

For a failed proactive compilation, we will fall back to using the FSD body which means that a thread will continue to execute that body. Therefore, the VM needs to decompile immediately on restore if debug is enabled. Once we have the ability to reuse the FSD code we generate pre-checkpoint this won't be necessary, but for now I believe we need this to make debug fully functional.

In the case of JNI Thunks, I'm working on a PR for that.

I think once we address all of this, debug on restore should be completely functional (if not optimal). FYI @TobiAjila @vijaysun-omr

dsouzai avatar Aug 23 '24 15:08 dsouzai

For a failed proactive compilation, we will fall back to using the FSD body which means that a thread will continue to execute that body. Therefore, the VM needs to decompile immediately on restore if debug is enabled. Once we have the ability to reuse the FSD code we generate pre-checkpoint this won't be necessary, but for now I believe we need this to make debug fully functional.

Hm actually... I guess it's not actually a problem if it's running FSD code because that is already set up to deal with debug events. So I guess I just need to address the JNI Thunk issue; the rest is covered by the method body invalidation that already occurs.

I'll think about this some more, but now I don't believe we need to decompile immediately anymore heh.

dsouzai avatar Aug 23 '24 15:08 dsouzai

In the case of JNI Thunks, I'm working on a PR for that.

Opened https://github.com/eclipse-openj9/openj9/pull/20108.

dsouzai avatar Sep 03 '24 20:09 dsouzai

Rebased to resovle the merge conflict.

JasonFengJ9 avatar Sep 04 '24 19:09 JasonFengJ9

@JasonFengJ9 How exatly did you enable debug on restore? whe I tried -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y in the options file it failed with:

JVMJITM043W AOT load and compilation disabled post restore.
post restore hook
JVMJ9VM007E Command-line option unrecognised: -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y
Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:534)
	at Debug.checkPointJVM(Debug.java:32)
	at Debug.main(Debug.java:16)
Caused by: openj9.internal.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:997)
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
	... 2 more

tajila avatar Sep 11 '24 15:09 tajila

@tajila I was using -agentlib:jdwp=transport=dt_socket,server=y,suspend=y, will look into -Xrunjdwp option.

JasonFengJ9 avatar Sep 11 '24 15:09 JasonFengJ9

same issue

JVMJITM043W AOT load and compilation disabled post restore.
post restore hook
JVMJ9VM007E Command-line option unrecognised: -agentlib:jdwp=transport=dt_socket,server=y,suspend=y
Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:534)
	at Debug.checkPointJVM(Debug.java:32)
	at Debug.main(Debug.java:16)
Caused by: openj9.internal.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:997)
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
	... 2 more

tajila avatar Sep 11 '24 15:09 tajila

@JasonFengJ9 We will need to add a docs issue for this as well

tajila avatar Sep 11 '24 21:09 tajila

-Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y

Added support of -Xrunjdwp.

@tajila this is ready for another look.

JasonFengJ9 avatar Sep 13 '24 18:09 JasonFengJ9

jenkins test sanity xlinux jdk17

tajila avatar Sep 13 '24 18:09 tajila

jenkins test sanity alinux64 jdk21

tajila avatar Sep 13 '24 18:09 tajila

jenkins test sanity win jdk8

tajila avatar Sep 13 '24 18:09 tajila

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_x86-64_linux_Personal_testList_0/540/consoleFull

16:40:07  ---SUMMARY OF FAILED TESTS---
16:40:07  cma001
16:40:07  -----------------------------
16:40:07  
16:40:07  -----------------------------------
16:40:07  cmdLineTester_jvmtitests_1_FAILED

Looking at this PR test failure.

JasonFengJ9 avatar Sep 16 '24 11:09 JasonFengJ9

Fixed the PR test failure due to an earlier break for undecorated JDWP agent processing while multiple agents were present.

In addition, disabled following failure conditions within criu_jitPostRestore.xml:

		<!-- output type="failure" caseSensitive="yes" regex="no">Some or all compiled code in the code cache invalidated post restore.</output>
		<output type="failure" caseSensitive="yes" regex="no">JIT compilation disabled post restore.</output -->

Verified manually that these failure conditions can re-enabled with

  • https://github.com/eclipse-openj9/openj9/pull/20047

Also moved DisposeEnvironment(jvmti_env) into jvmtiHookVMRestoreStartAgent() which is the last part of the cleanup if there was no JDWP agent specified. It releases VM access and hence can't be invoked within criuDisableHooks() from J9HOOK_VM_PREPARING_FOR_RESTORE.

@tajila this is ready for another look.

JasonFengJ9 avatar Sep 17 '24 22:09 JasonFengJ9

jenkins test sanity win jdk8

tajila avatar Sep 18 '24 11:09 tajila

jenkins test sanity alinux64 jdk21

tajila avatar Sep 18 '24 11:09 tajila

jenkins test sanity xlinux64 jdk17

tajila avatar Sep 18 '24 11:09 tajila

jenkins test sanity xlinux jdk17

tajila avatar Sep 18 '24 12:09 tajila