jpype icon indicating copy to clipboard operation
jpype copied to clipboard

Investigate if Panama instead of JNI can help to speed up JPype

Open ge0ffrey opened this issue 3 years ago • 6 comments

According to this tweet from the ElasticSearch team, replacing JNI with Panama increases performance: https://twitter.com/delabassee/status/1476091183302680576 It might be interesting to research if this can help to speed up JPype too.

There's a getting started with Panama tutorial here: https://foojay.io/today/project-panama-for-newbies-part-1/

ge0ffrey avatar Dec 30 '21 13:12 ge0ffrey

I've got a project that can help working with Panama. Similar to JNA, you create an interface and the library can either build the class you need dynamically at run time, or write the java code to disk.

The GitHub page has some graphs showing speed ups. If you are passing mainly primitives then the speed up can be huge, if you're passing arrays or structs then the gains are more modest.

https://github.com/boulder-on/JPassport#readme

JExtract is another very helpful tool put out by the Panama team. Given a header file it will generate a Java file with most of what you need already done.

https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_jextract.md

boulder-on avatar Jan 12 '22 21:01 boulder-on

I am not really sure that Panama would offer much benefits over JNI for JPype. The structure of JPype is to make calls from external source (Python) to Java and not the other way around. Thus it is all very low level calls. Unless Panama greatly speeds up marshalling of these low level calls that is not much speed increase possible.

Taken from the Panama web page....

To this end, Project Panama will include most or all of these components:

  • native function calling from JVM
  • native data access from JVM or inside JVM heap
  • new data layouts in JVM heap
  • native metadata definition for JVM
  • header file API extraction tools (jextract)
  • native library management APIs
  • native-oriented interpreter and runtime “hooks”
  • class and method resolution “hooks”
  • native-oriented JIT optimizations
  • tooling or wrapper interposition for safety
  • exploratory work with difficult-to-integrate native libraries

Notice the direction of most of these optimizations. They are meant to make it easy and quick for Java to call native code.

Given the amount of work required to convert the JNI oriented framework to another technology the benefits would have to be very significant.

Thrameos avatar Jan 16 '22 18:01 Thrameos

So far I've only been using the foreign API to go from Java to a C DLL. I did a very dirty test on callbacks comparing the foreign api to JNI. My test was

  • Call from Java to C passing a method pointer, int and double.
  • In C call the passed in method passing the int and double
  • In the Java callback add the two values and return the result.
  • In C return the result of the callback
  • Repeat 1 million times

This is admittedly a pretty artificial and trivial example and I have no experience optimizing JNI. I found the callback using the foreign API to be about 3.75 times faster than the JNI call. If I used JNA to perform the same task then the foreign API was about 28 times faster. One of the factors at play here is that by using the foreign API the JIT knows what's going on and can optimize. My JNI example only saw a small change in speed as the JVM warmed up, the foreign API had a much larger reduction in time.

These numbers mirror what I've seen calling from Java to C with primitive arguments only. This makes me think that the performance of arrays and more complex structures won't be as impressive, but will still beat JNI.

Another benefit is that the code to make the callback is pretty compact:

public static int primitive(int v, double d)
{
    return v + (int)d;
}


MethodHandle primitives = MethodHandles.lookup().findStatic(TestCallBack.class, "primitive",
            MethodType.methodType(int.class, int.class, double.class));
MemoryAddress segment = CLinker.getInstance().upcallStub(primitives, FunctionDescriptor.of(C_INT, C_INT, C_DOUBLE), var7);

testFL.call_CB(segment, 1, 2.0);

Having said all of this, I'm sure that this would be a lot of work. As well, at this point the foreign API is still incubator and therefore subject to change. Implementing this change for JVM's after 17 only would at least require multi-release jars and maybe other gymnastics.

boulder-on avatar Jan 19 '22 00:01 boulder-on

95% of our code is C++ calling Java, so we should compare only to that. The reverse (Java calling C++) is only in proxies and class construction.

If you wouldn’t mind posting the JNI code that you tested for C++ calling Java and getting a result I can see if there are missed optimizations.

I would expect it to look something like… (env, clazz, mid are all cached so it should just be dealing with the jvalue marshalling which is still remarkably slow)


int testCall(JNIEnv* env, jclass clazz, jmethodId mid, int v, double d)

{

  jvalue v[2];

   v[0].i = method;

   v[1].d = obj;

    return env->CallStaticIntMethodA(clazz, mid, &v);

}

Of course C++ to Java is only a small fraction of the total work because much of what we do is dealing with the matching of Python types to Java which means that JNI is likely not the limiting factor. But I am curious what the speed difference is when all values are cached.

The other issue is that because we calling general purpose we may end up going through the same level of marshalling. After all we have to be able to call (int, object, object), and (object, double, object, int), so that means our entry point is going to end up with a jvalue[] or something similar rather than an optimized foreign interface like FunctionDescriptor.of(C_INT, C_INT, C_DOUBLE),

Thrameos avatar Jan 19 '22 02:01 Thrameos

I updated the code to a) make it mainly about the callback, b) add a quick optimization for the method lookup.

JNI

public static int cb(int n, double d)
{
    return n + (int)d;
}

static native int nativeCallJavaMethod(TestJNI cb, int n, double d);
JNIEXPORT jint JNICALL Java_jpassport_test_callback_TestJNI_nativeCallJavaMethod
(JNIEnv *env, jclass cls, jobject impl_obj, jint i, jdouble d)
{
    jclass impl_cls;
    jmethodID impl_cb_mid;

    impl_cls = (*env)->GetObjectClass(env, impl_obj);
    if (!impl_cls)
        return -2;

    impl_cb_mid = (*env)->GetStaticMethodID(env, impl_cls, "cb", "(ID)I");
    if (!impl_cb_mid)
        return -1;

    int ret = 0;
    for (int n = 0; n < 1000000; n++)
        ret += (*env)->CallStaticIntMethod(env, impl_obj, impl_cb_mid, n, d);
    return ret;
}

I called the JNI method 1000 times, dropped the first 100 iterations and got an mean of 119ms per JNI call (which contained 1 million callbacks).

Panama

I used the same static method, the C code was:

typedef int (*callbackFN) (int, double);
int call_CB(callbackFN fn, int v, double v2)
{
    int sum = 0;
    for (int n = 0; n < 1000000; n++)
        sum += fn(v, v2);
    return sum;
}

The mean here for the same conditions was 54ms. So Panama was more like 2.2x faster. I'm not sure if these are closer now because I changed JNI to call a static java method or because I now only call GetStaticMethodID once per million callbacks, I suspect the latter.

boulder-on avatar Jan 20 '22 01:01 boulder-on

That is about what I would expect. Knowing the argument types on the foreign interface is about a two times speed up over the generalized marshalling that JNI does (at least for a trivial call). That is because it has to look at the method signature and see the I,D arguments then pull the correct fields out of jvalue. But unfortunately as we are calling general methods rather than specific ones (we must handle every case as a list of arguments) you would have to hit the general marshalling API in Panama. So I suspect when we are hitting the same level of work the differences will be much smaller. After all there was a way to do a general argument marshalling in JNI it likely would have been implemented in the faster way.

Thrameos avatar Jan 20 '22 02:01 Thrameos