openj9
openj9 copied to clipboard
How can I avoid a hang on error during CRIU checkpoint?
It seems like the below steps recreate the hang on error during CRIU checkpoint
- Obtain a Ubuntu 22.04 machine
- Install CRIU on the machine
- Download a build with an openj9 implementation on the machine
- Create a file with
vi Demo.javaon the machine - Copy the following code in the file on the machine
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.io.PrintStream;
import java.io.File;
import java.io.*;
import org.eclipse.openj9.criu.CRIUSupport;
public class Demo {
public static void main(String args[]) throws Throwable {
System.out.println("pre -checkpoint");
checkPointJVM("cpData");
System.out.println("post -checkpoint");
}
public static void checkPointJVM(String path) {
if (CRIUSupport.isCRIUSupportEnabled()) {
new CRIUSupport(Paths.get(path))
.setLeaveRunning(false)
.setShellJob(true)
.setFileLocks(true)
.checkpointJVM();
} else {
System.err.println("CRIU is not enabled\n" + CRIUSupport.getErrorMessage());
}
}
}
- Create a directory with
mkdir cpDataon the machine - Compile the code with
javac Demo.javaon the machine - Recreate the hang on error with
java -XX:+EnableCRIUSupport Demoon the machine
It seems like the next step is reproduce the hang on error during checkpoint and to find the root cause of the problem that needs to be addressed.
Dear @tajila and @babsingh, I would like to be assigned to this issue in order to start work on it. I look forward to your response.
Response is not existent and therefore @pshipton I would like you to assign this issue to me in order to address it.
@tajila ?
@tajila I hope you are doing well, and I would like to inform you about my unsuccessful attempt to reproduce the hang, which is detailed in the below terminal output, in order to know if the hang is still expected or if the steps should result in no error during CRIU checkpoint.
singh264@linux:~$ ls
ant-lib cpData Demo.java mkdocker.sh openj9_build
singh264@linux:~$
singh264@linux:~$ cat Demo.java
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.io.PrintStream;
import java.io.File;
import java.io.*;
import org.eclipse.openj9.criu.CRIUSupport;
public class Demo {
public static void main(String args[]) throws Throwable {
System.out.println("pre -checkpoint");
checkPointJVM("cpData");
System.out.println("post -checkpoint");
}
public static void checkPointJVM(String path) {
if (CRIUSupport.isCRIUSupportEnabled()) {
CRIUSupport.getCRIUSupport()
.setImageDir(Paths.get(path))
.setLeaveRunning(false)
.setShellJob(true)
.setFileLocks(true)
// remove this if running as a non-root user
.setUnprivileged(true)
.checkpointJVM();
} else {
System.err.println("CRIU is not enabled\n" + CRIUSupport.getErrorMessage());
}
}
}
singh264@linux:~$
singh264@linux:~$ javac -version
javac 21.0.8-internal
singh264@linux:~$
singh264@linux:~$ javac Demo.java
singh264@linux:~$
singh264@linux:~$ ls
ant-lib cpData Demo.class Demo.java mkdocker.sh openj9_build
singh264@linux:~$
singh264@linux:~$ java -version
openjdk version "21.0.8-internal" 2025-07-15
OpenJDK Runtime Environment (build 21.0.8-internal-adhoc.singh264.openj9-openjdk-jdk21)
Eclipse OpenJ9 VM (build master-5f6a02d948, JRE 21 Linux aarch64-64-Bit Compressed References 20250627_000000 (JIT enabled, AOT enabled)
OpenJ9 - 5f6a02d948
OMR - 41204d221
JCL - ad709377fba based on jdk-21.0.8+6)
singh264@linux:~$
singh264@linux:~$ java -XX:+EnableCRIUSupport Demo
pre -checkpoint
JVMJITM048W AOT load and compilation disabled pre-checkpoint and post-restore.
Exception in thread "main" org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:593)
at Demo.checkPointJVM(Demo.java:27)
at Demo.main(Demo.java:14)
Caused by: openj9.internal.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:1151)
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:587)
... 2 more
Likely, there is an issue with priviledges. One thing you try is to just run with sudo and set .setUnprivileged(false). Otherwise you can investigate the issue by setting logs (setLogLeveL) to 4 and looking at the logs.
Logs can be good to solve the problem of seeing an error during checkpoint and move towards doing a checkpoint without any errors, and @tajila would you mind confirming that this was your intention as well?
Well, the code runs in privileged mode, which expects the user to be a root user, by default, and based on the below terminal output no CRIU checkpoint errors occur despite the fact that I am a non-root user on my machine, and therefore I would like you to confirm that the expected behaviour is that we should detect this discrepency and report an error message to the user.
singh264@linux:~$ ls
ant-lib cpData criuOutput Demo.java mkdocker.sh openj9_build
singh264@linux:~$
singh264@linux:~$ cat Demo.java
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.io.PrintStream;
import java.io.File;
import java.io.*;
import org.eclipse.openj9.criu.CRIUSupport;
public class Demo {
public static void main(String args[]) throws Throwable {
System.out.println("pre -checkpoint");
checkPointJVM("cpData");
System.out.println("post -checkpoint");
}
public static void checkPointJVM(String path) {
if (CRIUSupport.isCRIUSupportEnabled()) {
CRIUSupport.getCRIUSupport()
.setImageDir(Paths.get(path))
.setLeaveRunning(false)
.setShellJob(true)
.setFileLocks(true)
.checkpointJVM();
} else {
System.err.println("CRIU is not enabled\n" + CRIUSupport.getErrorMessage());
}
}
}
singh264@linux:~$
singh264@linux:~$ echo $JAVA_HOME; $JAVA_HOME/bin/javac -version
/home/singh264/openj9_build/openj9-openjdk-jdk21/build/linux-aarch64-server-release/images/jdk
javac 21.0.8-internal
singh264@linux:~$
singh264@linux:~$ $JAVA_HOME/bin/javac Demo.java
singh264@linux:~$
singh264@linux:~$ ls
ant-lib cpData criuOutput Demo.class Demo.java mkdocker.sh openj9_build
singh264@linux:~$
singh264@linux:~$ sudo $JAVA_HOME/bin/java -XX:+EnableCRIUSupport Demo
pre -checkpoint
Killed
singh264@linux:~$
The user was root as I ran the CRIU checkpoint code with sudo, my apologies, so this issue can be closed as it was created assuming default configuration where the code expects the user to be a root user, and I believe a good follow-up issue can be to avoid an error, which is detailed below, during CRIU checkpoint as a non-root user.
singh264@linux:~$ ls
ant-lib cpData Demo.java mkdocker.sh openj9_build
singh264@linux:~$
singh264@linux:~$ cat Demo.java
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.io.PrintStream;
import java.io.File;
import java.io.*;
import org.eclipse.openj9.criu.CRIUSupport;
public class Demo {
public static void main(String args[]) throws Throwable {
System.out.println("pre -checkpoint");
checkPointJVM("cpData");
System.out.println("post -checkpoint");
}
public static void checkPointJVM(String path) {
if (CRIUSupport.isCRIUSupportEnabled()) {
CRIUSupport.getCRIUSupport()
.setImageDir(Paths.get(path))
.setLeaveRunning(false)
.setShellJob(true)
.setFileLocks(true)
// remove this if running as a root user
.setUnprivileged(true)
.checkpointJVM();
} else {
System.err.println("CRIU is not enabled\n" + CRIUSupport.getErrorMessage());
}
}
}
singh264@linux:~$
singh264@linux:~$ javac -version
javac 21.0.8-internal
singh264@linux:~$
singh264@linux:~$ javac Demo.java
singh264@linux:~$
singh264@linux:~$ ls
ant-lib cpData Demo.class Demo.java mkdocker.sh openj9_build
singh264@linux:~$
singh264@linux:~$ java -version
openjdk version "21.0.8-internal" 2025-07-15
OpenJDK Runtime Environment (build 21.0.8-internal-adhoc.singh264.openj9-openjdk-jdk21)
Eclipse OpenJ9 VM (build master-5f6a02d948, JRE 21 Linux aarch64-64-Bit Compressed References 20250627_000000 (JIT enabled, AOT enabled)
OpenJ9 - 5f6a02d948
OMR - 41204d221
JCL - ad709377fba based on jdk-21.0.8+6)
singh264@linux:~$
singh264@linux:~$ java -XX:+EnableCRIUSupport Demo
pre -checkpoint
JVMJITM048W AOT load and compilation disabled pre-checkpoint and post-restore.
Exception in thread "main" org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:593)
at Demo.checkPointJVM(Demo.java:27)
at Demo.main(Demo.java:14)
Caused by: openj9.internal.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:1151)
at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:587)
... 2 more
A non-root user's behaviour in unprivileged mode during CRIU checkpoing should be the same as root user's behaviour in privileged mode, which is no error, therefore @tajila or @JasonFengJ9, can one of you please confirm that creating a new GitHub issue in order to fix the checkpoint error for a non-root user, which is detailed in my pervious comment, is good?
I would like to clarify that the aforementioned observations were made on a local VirtualBox arm64 Linux machine that was running on a Mac machine, and since this virtual machine was setup improperly to run CRIU tests, I am providing the output of my unsuccesful attempt to recreate the hang during CRIU checkpoint on a GitHub Codepsaces x86_64 Linux machine:
singh264 ➜ /workspaces/codespaces-blank $ ls
Demo.java cpData criuOutput mkdocker.sh openj9-build
@singh264 ➜ /workspaces/codespaces-blank $
@singh264 ➜ /workspaces/codespaces-blank $ cat Demo.java
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.io.PrintStream;
import java.io.File;
import java.io.*;
import org.eclipse.openj9.criu.CRIUSupport;
public class Demo {
public static void main(String args[]) throws Throwable {
System.out.println("pre -checkpoint");
checkPointJVM("cpData");
System.out.println("post -checkpoint");
}
public static void checkPointJVM(String path) {
if (CRIUSupport.isCRIUSupportEnabled()) {
CRIUSupport.getCRIUSupport()
.setImageDir(Paths.get(path))
.setLeaveRunning(false)
.setShellJob(true)
.setFileLocks(true)
.checkpointJVM();
} else {
System.err.println("CRIU is not enabled\n" + CRIUSupport.getErrorMessage());
}
}
}
@singh264 ➜ /workspaces/codespaces-blank $
@singh264 ➜ /workspaces/codespaces-blank $ ./openj9-build/openj9-openjdk-jdk21/build/linux-x86_64-server-release/images/jdk/bin/javac Demo.java
@singh264 ➜ /workspaces/codespaces-blank $
@singh264 ➜ /workspaces/codespaces-blank $ sudo ./openj9-build/openj9-openjdk-jdk21/build/linux-x86_64-server-release/images/jdk/bin/java -XX:+EnableCRIUSupport Demo
pre -checkpoint
Killed
@singh264 ➜ /workspaces/codespaces-blank $
@singh264 ➜ /workspaces/codespaces-blank $ sudo criu restore -D ./cpData -v2 --shell-job
JVMJITM048W AOT load and compilation disabled pre-checkpoint and post-restore.
post -checkpoint
@tajila would you mind confirming the above output alongside with the scope that defined in the description of this issue is sufficient to close this issue?