opensearch-sdk-java
opensearch-sdk-java copied to clipboard
[PROPOSAL - IN PROC Support] Run a process within the same JVM
What/Why
What are you proposing?
NOTE: This proposal is not related to extension framework but talks about if a process can be run within the same JVM.
With the current extension framework, extensions are running out of process or remote node from OpenSearch cluster. This proposal talks about the multiple approaches a process can be run within the same JVM. This approaches can help us in the future for the in process support.
What problems are you trying to solve?
- ProcessBuilder method create a native process and return an instance of a subclass of Process that can be used to control the process and obtain information about it.
ProcessBuilder pb = new ProcessBuilder(<Process>);
Process process = pb.start();
- Runtime.getRuntime().exec method returns a new Process object for managing the subprocess
Process process = Runtime.getRuntime().exec("ls");
- Multithreading
new Thread() {
public void run() {
<Process>
}
}.start();
Results:
Using the command Java Virtual Process Status(jps) the results were found that only a single JVM is spun up for the Main method of the Java File and all the 3 approaches mentioned above doesn't create a new JVM to run rather runs in the same process as the JVM of Main method of ProcessBuilderExtensions.java.
When a simple java file is running which prints Hello World!:
77587 jdk.jcmd/sun.tools.jps.Jps -mlv -Dapplication.home=/Library/Java/JavaVirtualMachines/jdk-14.0.2.jdk/Contents/Home -Xms8m -Djdk.module.main=jdk.jcmd
77541 jdk.compiler/com.sun.tools.javac.launcher.Main ProcessBuilderExtensions.java --add-modules=ALL-DEFAULT
31564 -Xms128m -Xmx750m -XX:ReservedCodeCacheSize=512m -XX:+IgnoreUnrecognizedVMOptions -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=50 -XX:CICompilerCount=2 -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -ea -Dsun.io.useCanonCaches=false -Djdk.http.auth.tunneling.disabledSchemes="" -Djdk.attach.allowAttachSelf=true -Djdk.module.illegalAccess.silent=true -Dkotlinx.coroutines.debug=off -XX:ErrorFile=/Users/kazabdu/java_error_in_idea_%p.log -XX:HeapDumpPath=/Users/kazabdu/java_error_in_idea.hprof -Xmx2500m -Djb.vmOptionsFile=/Users/kazabdu/Library/Application Support/JetBrains/IntelliJIdea2021.3/idea.vmoptions -Dsplash=true -Didea.home.path=/Applications/IntelliJ IDEA.app/Contents -Didea.jre.check=true -Didea.executable=idea -Djava.system.class.loader=com.intellij.util.lang.PathClassLoader -Didea.paths.selector=IntelliJIdea2021.3 -Didea.vendor.name=JetBrains
When the any of the above 3 approach is running a new process within the same JVM:
77861 jdk.compiler/com.sun.tools.javac.launcher.Main ProcessBuilderExtensions.java --add-modules=ALL-DEFAULT
31564 -Xms128m -Xmx750m -XX:ReservedCodeCacheSize=512m -XX:+IgnoreUnrecognizedVMOptions -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=50 -XX:CICompilerCount=2 -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -ea -Dsun.io.useCanonCaches=false -Djdk.http.auth.tunneling.disabledSchemes="" -Djdk.attach.allowAttachSelf=true -Djdk.module.illegalAccess.silent=true -Dkotlinx.coroutines.debug=off -XX:ErrorFile=/Users/kazabdu/java_error_in_idea_%p.log -XX:HeapDumpPath=/Users/kazabdu/java_error_in_idea.hprof -Xmx2500m -Djb.vmOptionsFile=/Users/kazabdu/Library/Application Support/JetBrains/IntelliJIdea2021.3/idea.vmoptions -Dsplash=true -Didea.home.path=/Applications/IntelliJ IDEA.app/Contents -Didea.jre.check=true -Didea.executable=idea -Djava.system.class.loader=com.intellij.util.lang.PathClassLoader -Didea.paths.selector=IntelliJIdea2021.3 -Didea.vendor.name=JetBrains
78044 jdk.jcmd/sun.tools.jps.Jps -mlv -Dapplication.home=/Library/Java/JavaVirtualMachines/jdk-14.0.2.jdk/Contents/Home -Xms8m -Djdk.module.main=jdk.jcmd
The above result clearly shows that only a single JVM process is running with the above 3 approaches mentioned.
Any remaining open questions?
The question remains can we utilize the above approaches and run extension withing the same JVM of OpenSearch cluster.
+1 on using ProcessBuilder. Can you further investigate what happens to the streams? stdin: Process.getOutputStream() stderr: Proces.getErrorStream() stdout: Process.getInputStream()
You may need to do something to send them to logs, or send them to the originating process (OpenSearch?) to handle the output.
One other concern is how the separate processes will share/manage system resources.
+1 on using ProcessBuilder. Can you further investigate what happens to the streams?
This is how I have read from the stream for ProcessBuilder: Code is present here with more javadoc.
BufferedReader stdInput
= new BufferedReader(new InputStreamReader(
process.getInputStream()));
String s = null;
while ((s = stdInput.readLine()) != null) {
System.out.println(s);
}
One other concern is how the separate processes will share/manage system resources.
This is a bigger question which I think will be answered once we start looking into the in proc support for extensions.
This is how I have read from the stream for ProcessBuilder
That works for this trivial example. But there are some design questions we should ask (and as this is a proposal I'm asking them!)
- Do we let the processes themselves take control over their streams?
- Do we let the owning process (OpenSearch) read the stream?
- How do these streams interact with logging?
- Do we take advantage of this method of inter-process communication for any control of processes or even, at a bare minimum, a "heartbeat"?
Note that when a process terminates, some Operating Systems (Windows in particular, also Solaris) don't close the streams on process termination, which can result in resource leakage if they aren't explicitly closed. I had to work around this here. I used Runtime.exec() rather than ProcessBuilder.start() but I think it's probably the same underlying code, so we'll need to make sure we handle the streams properly.
Another thing to consider is how the processes will share/coordinate resources (CPU/threads, memory, etc.).
If we were to launch an Extension as a Thread, we could pass it a Runnable with whatever coordinating bits we wanted to (Thread pool, etc.).
However, launching as a separate process loses that coordination ability. Since the processes are sharing the same JVM they're competing for the same heap space. Who wins the fight? How do we restrict the extension process(es) from exhausting resources? Are their easy ways to run extensions at a lower priority (like nice in *nix)?
So this just calls into a main, but has all the problems of JAR hells and such, or are there any other isolation benefits?
all the problems of JAR hells and such
I'm thinking there are workarounds for that using different classloaders.