python-gatenlp icon indicating copy to clipboard operation
python-gatenlp copied to clipboard

Explore alternatives to py4j or ways to keep using py4j

Open johann-petrak opened this issue 4 years ago • 3 comments

Apparently the library is not guaranteed to work with Java 11 *) and it is not clear if it will be maintined in the future, see https://github.com/bartdag/py4j/issues/426

Ideally we would be able to just go on using py4j, but if this looks dubious, check out alternatives:

  • JPype: https://github.com/jpype-project/jpype
  • jep: https://github.com/ninia/jep
  • jpy: https://github.com/bcdev/jpy/
  • pyjnius: https://github.com/kivy/pyjnius
  • Javabridge: https://github.com/LeeKamentsky/python-javabridge
  • implement our own minimal bridge for just the things we want to do:
    • JSON-based exchange of data
    • just do RPC
    • use similar socket protocol or just do it over HTTP REST (with some kind authentification!). This would have the advantage that we can run the GateSlave on a different machine even

*) however the things I tested so far seem to work with Java 11 and even Java 15

johann-petrak avatar Feb 07 '21 15:02 johann-petrak

I would be happy to answer any questions about the JPype option. I should note that several of the options listed are in a similar position of either being unmaintained or inactive at the current time.

It is important to note the key differences between JPype and Py4j. JPype is a JNI shared process bridge while Py4j is a IPC gateway. A JNI bridge must connect the JVM and Python as if they are one VM and there is no way to restart as they are completely linked at the process level. An IPC gateway is much more transactional as multiple JVM can be controlled or restarted as needed. But being less direct it cant freely exchange data without conversion.

It is rather disappointing to hear that Py4j is struggling. The API requirements of a gateway and JNI bridge are quite different so there really isn't any way to fill in the void by expanding JPype to cover a gateway bridge.

If you need transaction based access to a JVM then I can't be much assistance. When i require a gateway, I use zeromq and protocol buffers to expose a service but it is not a general purpose like Py4j.

Thrameos avatar Feb 07 '21 17:02 Thrameos

@Thrameos thank you for that clarification! We do not really know yet what the usage pattern for this in gatenlp will be and we will probably do some experimentation to see what the respective advantages and disadvantages of tight JNI coupling vs just an IPC bridge for gatenlp are. In any case, happy to see that JPype is around and being maintained!

johann-petrak avatar Feb 08 '21 08:02 johann-petrak

Just for clarity, py4j is still going to continue as mature software. The current maintainer is just seeking addition help. So if you require transaction IPC style it may be an option. Admittedly I do not understand his comments regarding Java 11 and can see why they seem scary.

As for usage patterns, all Java bridges regardless of implementation can create objects, call methods and access fields. The two benefits of tightly coupled like JPype are if you need either large data transfers or a extensive access to Java. Tightly coupled means memory can be shared directly so if you want to take a huge array and shove it to numpy or matplotlib then no issue. Extensive access means if there are many classes that need to be exposed to the user. If you find yourself wrapping 30 classes, then it may be easier to just expose the Java versions and add customizers to those classes to make them appear like native. Typical example is if the Java code uses a lot of factories to set up complex operations, rewriting this all in Python can be costly. Instead just expose the factory and register a few converters so Python types can be passed to Java would achieve the same result.

If you are only going to call one or two methods and the data transfer is easily wrapped with JSON then a gateway or server solution may be better. It avoids the JVM starting paradox. Modules are not supposed to do actions on import, but if something in the import needs to access the JVM, then you have to defer and have the user call at specific point which can be difficult if you are using certain Python features like signatures. Tightly coupled also requires dealing with shutdown where crashes or deadlocks are possible despite my efforts thus far.

Regardless you may end up using JPype as a tool to help debug your server code even if it is not in your final product as its all access Java can effectively make Python into a Java debugger.

Thrameos avatar Feb 08 '21 15:02 Thrameos