jep Reentering Java from different Python thread

Hey,

Jep seems to allow entering back into java from a separate python thread, but if an exception occurs it's not properly handled, and the java stack trace is lost - we are seeing some invalid thread error message instead.

Reproducer:

        @Test
	public void testReenterException() throws JepException {
		try (SharedInterpreter interp = new SharedInterpreter()) {
			Object java = new Object() {
				public void fail() {
					throw new RuntimeException();
				}
			};
			interp.set("java", java);
			interp.exec(multiline(
					"from concurrent.futures import ThreadPoolExecutor",
					"def failOnOtherThread():",
					"  java.fail()",
					"with ThreadPoolExecutor(max_workers=1) as executor:",
					"  executor.submit(failOnOtherThread)"));
			
		}
	}

Output:

"Error while processing a Java exception, invalid JepThread."

I hoped that reentering java from a separate python thread would be legal. In that case I would expect an exception wrapper on the python side.

My guess is that it is coming from here: code

and I could trace it back to this commit: original commit

If it is not legal to reenter java from a different thread, I would expect some guard against this in an earlier place.

I tried to work around the limitation that Jep can only be accessed from one thread (coming from the java side). In our scenario we might have some "main" python script which is hanging in an API call on the java side. At this time we still need to be able to process callbacks of a simulation process in the same interpreter in order to be able to get the blocking API unblock eventually. My hope was that entering back into Java on another thread and wait there would release the GIL and free the interpreter thread to further process more requests. During my experiments I ran into the described problem and now I'm questioning, if my approach is legal with regards to Jep's multi-threading model. Do you have any recommendation on how to keep the interpreter thread free while waiting in a blocking java call coming from python? We thought of an alternative model, where the blocking API continiuous to process further requests on the same thread, but this also doesn't feel like a nice solution, as all blocking APIs need to explicitly implement this.

Thanks Janik

Feb 04 '21 15:02 jsnps

It is not currently supported to call back into java from another python thread. I don't think we have given it much thought. It is possible some pieces may work but it would require significant effort to test and get all of jep working on a python created thread.

One solution may be to perform the blocking call on a different a different java thread. The python thread would call into java. The java call would start another thread to do the blocking. The python thread would be free to process other tasks, but would need to occasionally poll a Future or Queue to see if the blocking call is complete.

More within the jep architecture, you can actually share information between multiple SharedInterpreters running on different Threads. Currently the only way to access this functionality is to import the same module in both interpreters. So if you package your python logic into module then you could access it from many Sharednterpreters at once and it would not be a problem if one interpreter was blocking.

Feb 04 '21 16:02 bsteffensmeier

Hey,

thanks for the very quick response. :) Currently I don't see yet how your first suggestion would work, because we need to keep the stack on the python side for the blocking call, right? (Maybe it is possible to freeze it and restore it?) Else I can only imagine a solution with one stack, where the blocking call accepts a new exec - so the execution is on top of the current stack.

The SharedInterpreter idea sounds more promising, which I have dropped previously because the globals should be accessible (need to double check on this). Thinking about this again, I wonder why it is no problem to share the module's states but not the global dictionary. Is there a strong reason for this or could you imagine some customizations which would also allow to share the globals?

Else would it be possible to synchronize the globals in some way from the java side? Prior the callback execution clone the global state from the main interpreter into the shared interpreter and when finished sync it back into the main interpreter if it has been modified? (sounds like much overhead ;) although, since it is a shared interpreter we could have a shallow copy of the dictionary)

Feb 04 '21 19:02 jsnps

because we need to keep the stack on the python side for the blocking call, right?

I'm not sure I understand your use case. I assumed the blocking call in java was a pure java thing with no python interaction, in which case from the standpoint of jep it doesn't matter what thread it is on. If your blocking call requires interaction with python objects then it would need to remain on the same thread.

I wonder why it is no problem to share the module's states but not the global dictionary. Is there a strong reason for this or could you imagine some customizations which would also allow to share the globals?

There is no strong technical reason why globals or other things cannot be shared. I ahve considered many options for how we could arrange things to have different levels of sharing, it is very time consuming to implement, document, and maintain new things and I simply have not had the time.

Else would it be possible to synchronize the globals in some way from the java side? Prior the callback execution clone the global state from the main interpreter into the shared interpreter and when finished sync it back into the main interpreter if it has been modified? (sounds like much overhead ;) although, since it is a shared interpreter we could have a shallow copy of the dictionary)

If you define a module that can be shared you should be able to stash the globals in the module and load them from other interpreters before doing anything. I think you could even programatically define a module for this purpose without actually needing python files in your path. I believe the python globals() function gives you access to the globals in jep, although I am not positive about that. The advantage of putting your code into a module is you could use the module state as globals and wouldn't need to be copying it. You could also just store individual variables in the module and access them there instead of copying everything. I'm not sure what level of synchronization you would need for this. From the python perspective two jep SharedInterpreters on different thread are equivalent to two python threads so you should need similar syncronization structures if you might be simultaneously modifying things.

Feb 04 '21 19:02 bsteffensmeier

I'm not sure I understand your use case. I assumed the blocking call in java was a pure java thing with no python interaction, in which case from the standpoint of jep it doesn't matter what thread it is on. If your blocking call requires interaction with python objects then it would need to remain on the same thread.

Yes, the problem here is that our Python API (called from a user python script) might be blocking (e.g. wait until certain event), while other Python code must be executed.

Stashing the globals in a separate module sounds like an interesting idea, I quickly tried to figure out how to replace it with a custom dictionary but couldn't find a way to do this. I can easily exchange the globals() function and return a custom dict, but this does not have the effect of global variables being stored in this dictionary. I will investigate this tomorrow a little further. Today we don't have any particular synchronization mechanism in place, so I would say racing access in existing scripts would be already today problematic - although I would say, this is more a corner case for our use case. Currently we are using Jython, which in itself is thread-safe, so my assumption is that multi-python-threaded access to the same resource with the GIL would behave kind of similar.

Feb 04 '21 20:02 jsnps

Stashing the globals in a separate module sounds like an interesting idea, I quickly tried to figure out how to replace it with a custom dictionary but couldn't find a way to do this. I can easily exchange the globals() function and return a custom dict, but this does not have the effect of global variables being stored in this dictionary. I will investigate this tomorrow a little further.

I was thinking on the main thread you would do mymod.savedglobals = globals() And then on the other threads you could do globals().update(mymod.savedglobals)

Today we don't have any particular synchronization mechanism in place, so I would say racing access in existing scripts would be already today problematic - although I would say, this is more a corner case for our use case. Currently we are using Jython, which in itself is thread-safe, so my assumption is that multi-python-threaded access to the same resource with the GIL would behave kind of similar.

Yes the GIL should generally prevent really terrible things.

Feb 05 '21 02:02 bsteffensmeier

Hey bsteffensmeier,

this helped a lot thanks :) I will give this a try in the bigger context, but the programatically created shared module seems to work nicely already. Short snipped for demonstrating it, in case anybody else needs something similar:

	@Test
	public void testSharedModule() throws JepException, InterruptedException {
		try (SharedInterpreter interp = new SharedInterpreter()) {
			interp.exec(multiline(
					"from types import ModuleType",
					"shared_module = ModuleType('shared_module')",
					"shared_module.globals = {}",
					"sys.modules['shared_module'] = shared_module",
					"import shared_module",
					"shared_module.globals['foo'] = 'bar'"));
			AtomicBoolean failed = new AtomicBoolean(true);
			Thread otherThread = new Thread() {
				@Override
				public void run() {
					try (SharedInterpreter interp1 = new SharedInterpreter()) {
							interp1.eval("import shared_module");
							String actual = (String)interp1.getValue("shared_module.globals['foo']");
							assertEquals("bar", actual);
					}
					catch (JepException e) {
						e.printStackTrace();
					}
					failed.set(false);
				};
			};
			otherThread.start();
			otherThread.join();
			assertFalse(failed.get());
		}
	}

With regards to the original request, although I was expecting the reentrance into Java to fail on a separate thread, if it violates the Jep threading model, I would still vote to keep this flow open if it is only an issue for special constellations. What do you think of still creating a proper python exception with the original java exception info (chained exception in python?). The message can still be the same (illegal thread access), but it gives some hint about "what" went wrong on the other thread. Do you think this makes sense? Else feel free to close the issue :)

Feb 05 '21 11:02 jsnps

Yes, I agree it makes sense to leave this open to investigate the exception handling. Usually when we printf it is an area we suspect is unreachable Since you have a clear example that hits it we should do better.

Feb 05 '21 15:02 bsteffensmeier

I have pushed a fix at 9d957f85.

The problem is more complicated than I thought. Initially I planned to just check the thread before calling back into java and throw an exception at any attempt to use a java object in python. Unfortunately that change caused unit test failures for clsoing with running python threads and importing python on python threads.

The closing issues are related to how the test is written and could likely be solved with a different test but the issue with importing on python threads is not as simple. The problem is that the java_import_hook is automatically available in all threads and the very first thing it does is call into java to determine if an import is a java package. I would like to support python threads but import hooks are not thread specific so if that hook is broken on other threads it becomes impossible to import any python on other threads and I am certain that would break some jep users.

Since our own code is able to call into java from a python created thread it makes this issue even more important because those calls could cause exceptions and those should be handled properly. I looked closer at what limitations we actually have on java calls in python threads and there are two main limitations:

We cannot create a python wrapper for java objects in python(PyJObject) because we do not have access to the jepThread data structure that includes the caches and configuration for wrapping java objects.
We cannot create a java wrapper for a a python object (jep.python.PyObject) because we do not have access to a jep.Interpreter object which is responsible for tracking java garbage collection and freeing python resources.

The problem with exception handling is that we normally wrap a java exception in a PyJObject as part of the Python exception we throw so that the entire object is available in python. I have added special handling for java exceptions that occur on python created threads so that a python exception is thrown which includes the java stack trace. I would still not recommend using java much from python created threads since you are likely to quickly run into issues with the wrapper objects but if you can make simple calls into java the exception handling should be better now.

Feb 08 '21 00:02 bsteffensmeier

Hey, thanks for your efforts and explanations. My eyes are slowly adapting to C code - that's really another world :D I now understood why my test case on a separate python thread worked - because I was only returning a simple string. As soons as I return a java object which converts to pyjobject, it will fail because of the jep thread check - similarly you are not able to wrap the java exception the standard way.

Are 1. and 2. (managing the wrappers on both sides) basically the limiting factors for jep supporting multi threaded access for the same jep instance? Or are there further limitations e.g. coming from JNI itself? In other words, if the proxy managing + garbage collection were synchronized, would it be possible to access the same jep instance from multiple threads in a "safe" way? If I understand correctly the GIL scopes are already synchronized.

I understood that SharedInterpreters are the way to go for multi threaded access from the Java side as of today - I am testing this for our use case and hopefully it will suffice - so far I'm positive about it. :) Would be still good to learn about the feasibility of properly supporting multi threaded java access from the python side in future. Of course this will open the door to get into trouble on the java side as there we are really running concurrently, but the python side should effictively be single threaded and the conversion could be part in that sequentialized flow.

Feb 09 '21 10:02 jsnps

A major obstacle to multithreaded access is the PyThreadState structure from the python c-api. Each jep interpreter is a wrappper around a PyThreadState. A PyThreadState object should only be create, used, and deleted from a single thread and you can really only have one PyThreadState on a thread at a time.

For java threads the reason you cannot use a jep interpreter on more then one thread is because it only has a single PyThreadState. In theory we could let an interpreter have more PyThreadStates, we could make a new PyThreadState whenever you access an interpreter from a new thread. The problem is cleaning up these states. If an Interpreter accumulated a dozen thread states then when you call close we need to get on a dozen threads and delete the PyThreadState. I don't see any way we could possibly do that.

I have thought about this before and I think if we create a way for interpreters on different threads to share globals then developers should be able to accomplish anything they need to do while also defining a clear lifecycle for each interpreter to ensure it is closed. I envision adding a new method to the Interpreter interface that could be called on other threads to return a new Interpreter that was valid on that Thread but sharing globals. I think that would provide alot of power for people to build more advanced capabilities. Here is an example of what I think that might look like:

class Example1{

    private Interpreter original;

    public void multithreadedWork() {
        try (Interpreter original = new SharedInterpreter()) {
            original.exec("import library import functions");
            original.exec("data1, data2 = functions.loadBigData(2)");
            Thread worker1 = new Thread(this::doWork1);
            Thread worker2 = new Thread(this::doWork2);
            worker1.start();
            worker2.start();
            worker1.join();           
            worker2.join();
            System.out.println(original.getValue("c.findBestResult(result1, result2)"));
        }
   }
   
   private static void doWork1(){
       try (Interpreter worker = original.useOnThisThread()){
           worker.exec("result1 = functions.analyze(data1)");
       }
   }
   
   private static void doWork2(){
       try (Interpreter worker = original.useOnThisThread()){
           worker.exec("result2 = functions.analyze(data2)");
       }
   }
}

For python created threads like you started with it is actually a little easier. I don't know of any limitations on JNI and threading. Most of the problems and limitations wrapping are self imposed from the way we have tied our wrapped objects so closely to individual interpreters. The PyThreadState for these threads will be created by python and when the thread is closed python will be the last thing running and will delete it.

Feb 10 '21 04:02 bsteffensmeier

To me this looks like a very clean and easy to use solution from the java side :) I guess it would be possible to have the globals backed by the same dictionary, because any synchronization as we discussed earlier could potentially run into timing problems where updates are "overwritten" from competing threads: E.g:

Worker1-> sync from main -> do work -> sync back (adding foo to globals during work) Worker2 -> sync from main -> do work ................................. -> sync back (adding bar to globals but accidentaly deleting foo, because it was not existing when we synced initially)

Regarding the scoping, what if we don't join worker1 and the "parent" interpreter is going out of the try scope, before the sharedinterprert (useOnThisThread) has finished? Should the close block until all shared interpreters are closed?

For the second, I don't have enough understanding about the codebase yet, but I see that pyjobject_init calls pyembed_get_jepthread which is trying to get the current jepthread, which does not exist for the python side created thread. This one is used for caching the java methods. Would it be possible to generate the jepthread on the fly and attach its lifetime to the pythreadstate? And is the cache not shared amongst mulitple threads, because different jeps can have different class loaders and or because of threading complexity? Similar the other way around when wrapping python objects. When constructing the PyObject instances, they are attached to the jep memory manager. So also here we would need a separate jep instance for the separate python thread?

Feb 11 '21 14:02 jsnps

To me this looks like a very clean and easy to use solution from the java side :) I guess it would be possible to have the globals backed by the same dictionary, because any synchronization as we discussed earlier could potentially run into timing problems where updates are "overwritten" from competing threads: E.g:

Worker1-> sync from main -> do work -> sync back (adding foo to globals during work) Worker2 -> sync from main -> do work ................................. -> sync back (adding bar to globals but accidentaly deleting foo, because it was not existing when we synced initially)

Regarding the scoping, what if we don't join worker1 and the "parent" interpreter is going out of the try scope, before the sharedinterprert (useOnThisThread) has finished? Should the close block until all shared interpreters are closed?

For SharedInterpreters I don't think we would need to have any dependencies. Whoever the last one holding the globals would DECREF to 0 and clean it up. I would like to have the same capabilities for SubInterpreters but for that case we would need to clean up the actual interpreter as well as the globals dict so I suspect we would have to ensure all other threads were cleaned up before closing the interpreter. Keep in mind I haven't actually tried this so there may be things I haven't anticipated but I think it is possible.

For the second, I don't have enough understanding about the codebase yet, but I see that pyjobject_init calls pyembed_get_jepthread which is trying to get the current jepthread, which does not exist for the python side created thread. This one is used for caching the java methods. Would it be possible to generate the jepthread on the fly and attach its lifetime to the pythreadstate?

I don't know of any way to attach the jepthread to the lifetime of the PyThreadState. If there was that would be pretty powerful. I know there is some python API for handling thread locals and if we could tie into that we might be able to get something working but I haven't given it much investigation.

And is the cache not shared amongst mulitple threads, because different jeps can have different class loaders and or because of threading complexity?

The cache is not shared. It should be shared for any SharedInterpreters and for new threads it should be able to reuse the same cache but that is not how it is now.

Similar the other way around when wrapping python objects. When constructing the PyObject instances, they are attached to the jep memory manager. So also here we would need a separate jep instance for the separate python thread?

With the way the java code is currently written you would need to create a jep instance. With the concept of Shared Interpreters and shared globals interpreters I think this area needs more work. Ideally all SharedInterpreters would share a MemoryManager and all PyObjects could be used with any of those SharedInterpreters. SubIntterpreters should have their own MemoryManager but if it is possible to create shared interpreters on other threads based off the sub interpreter then ideally it would use the same memory manager. The code is not set up for this all.

If you are trying to understand some of this it might hep to understand some of the history of Jep. Until recently jep was only using sub interpreters and most of the code still reflects that. I don't know of anyone was using jep for multiple threads except to create isolated interpreters on different threads. Since there was a one-to-one correlation between a thread, Jep instances, and python interpreter they are all managed together. The introduction of SharedInterpreters isn't particularly related to threading or even sharing but mostly because some cpython modules such as numpy don't work in sub interpreters. Shared interpreters change the dynamic so that a Jep instance is no longer tied to a cpython Interpreter but to a PyThreadState instead. There is alot of potential to continue to evolve shared interpreters in the ways we are talking about but right now the code isn't really set up for it.

Feb 12 '21 17:02 bsteffensmeier

And is the cache not shared amongst mulitple threads, because different jeps can have different class loaders and or because of threading complexity?

The cache is not shared. It should be shared for any SharedInterpreters and for new threads it should be able to reuse the same cache but that is not how it is now.

On dev_4.0 I have moved the caches to our internal module(_jep) so they can be reused by any thread, including python created threads. This should improve performance of SharedInterpreters and also offers you way more options for getting java objects on python created threads. See 1e8e0cf7.

Feb 13 '21 02:02 bsteffensmeier

Hey bsteffensmeier,

we encountered some problems with our prototype for sharing the globals between multiple shared interpreters. I tried to implement a "onThisThread" API to create a temporary SharedInterpreter which pulls the globals from a shared module and pushes them back on a close of that interpreter. This works to a certain extend, but hits some limitations e.g. if methods are defined accesing globals inside a temporary interpreter, it will point to the "separate" globas of that interpreter. For example:

def changeA():
  global a
  print("a is: {}".format(a))
  a += 1

changeA()

If I execute that script in a SharedInterpreter which automatically synchronizes the globals, "a" will be picked up correctly from the shared globas, incremented and correctly stored back in the shared globals on close. A new SharedInterpreter will also see the changeA method correctly, but if I invoke it, "a" of the closed interpreter globals will be incremented, but it of course does not reflect in the shared globals.

So I digged a bit deeper into it and came up with these two options:

a) Java / Python only: Implement the interpreter interface inside a module and delegate all calls into that module to achieve that everything is executed inside the module's context. E.g.: interpreter.exec(String.format("shared_module.exec2(\"\"\"%s\"\"\")", source));

//defined where the shared module is defined (once)
[...]
"def exec2(s):\n" +
"  exec(compile(s, 'stdin','exec'), globals())\n" + 
"shared_module.exec2 = exec2\n" +
[...]

In a small prototype this works and solves the mentioned problem, but of course it might also have some pitfalls, e.g. correctly escaping quotes etc.

b) in C: The fact that we now have an interpreter "shell" only functioning as delegate to dispatch everything into a module, made me questioning why the modules can be shared but not the globals. So I looked into the C code again and found two things. First, globals seem to always be accessed inside a PyEval_AcquireThread/PyEval_ReleaseThread scope. So with some naive thinking, I'm only seeing a problem inside thread_close, which clears the thread globals (although I did not exactly see that in my experiment described above, where "a" is happily incrementing, but maybe it is only "a" survining and not the globals dictionary). Additionally I found "pyembed_getvalue_on", which seems to execute getvalue on a specific module - also sounds kind of useful for what I want to achieve, but I didn't see any usage of this.

I would spend the time to create a prototype with a shared globals dictionary, if you say this looks feasible. I'm imagining another flag in the SharedInterpreter constructor to drive this. But I first wanted to hear about your opinion, whether you see any other pitfalls in terms of threading or would immediately say "forget it!" .

In case of sharing, I would create a shared globals variable in pyembed_startup similar to the concept of shared modules - and reuse it in thread_init

Thanks.

[Update] I just hacked it in and it seems to work, of course I'm still interested to hear if you have any objections :) So id(globals()) is now constant, with a relative minimal change in pyembed.c

Mar 15 '21 18:03 jsnps

I don't think there is any technical reason globals should not be shared.

I don't understand how pyembed_getvalue_on would be helpful. You should just be able to specify the module name in a regular Interpreter.getValue() like interp.getvalue("someModule.variable") Or you could even get the module as a PyObject and call methods on that to pull values form the module. I think pyembed_getvalue_on may ahve been used in older versions of PyObject but the newer PyObject has the same functionality.

I would prefer if we didn't have just a boolean to specify shared globals this creates another static variable we have to manage and it is limited to always sharing the same set of globals. I would prefer we somehow specify which interpreters we want to share globals with. This could be done by passing an interpreter to the config or using a method on an existing interpreter to create the new interpreter. This would basically be the useOnThisThread() method I mentioned above, although that name is not greate. Perhaps createSharedGlobalsInterpreter()? Anyway this would be much more powerful because you could create separate pools of interpreters that share different sets of globals which would come in handy for larger applications that may not want to share just one set of globals everywhere.

Mar 16 '21 14:03 bsteffensmeier

Yes I also thought of grouping before, although already on the shared modules level .. so basically have multiple shared interpreter groups, each of them sharing separate main interpreters for the shared modules - I know this is not what shared interpreters were invented for, but I was thinking of how to create two distinct interpreters, which can both be used from multiple threads.

Still, for the shared globals, grouping is also a strong argument and I like your idea much better than the prototype which I have now :) .. as you say, my current static shared globals are dangling there and nobody really owns them and clears them up. Thanks for your sketch, I will try to think this a bit furhter. Right now, coming from the shared module approach, I don't even have a SharedInterpreter open for longer than executing a job. So for any job, which can be on unknown threads, I create a thread local SharedInterpreter on the fly and release it, once I'm done. With your approach I kind of need to keep a "Reference SharedInterpreter" alive, which I also need to be able to close in a controlled way (correct thread etc). So this is additional complexity on the java side (but maybe for a good reason ;)). Probably the grouping can also be done on the python side in some way, by pointing to the globals of separate shared modules or simply separate dictionaries in one shared module, which then can be controlled at interpreter creation time. I saw some module's globals access for sub interpreters . So maybe something similar can be also done for sharing globals in shared interpreters. But also here the question is, who will clean this up and when?

Just out of interest, although we are getting off topic here, is there a way to reset jep? I see the close of MainInterpreter which interrupts the dedicated thread, but I could not really spot if this triggers further cleanup also on the C side. I found the pyembed_shutdown which is hooked up with the jni unload, but I don't know when exactly this is called and if the few lines are really getting you back to 0. Background is that in our ui we have an interactive python console and so far we have a reset interpreter action, which gets you back into a clean state. Can you say whether this is possible today, or would be possible at all with an embedded C Python?

Mar 16 '21 15:03 jsnps

Just out of interest, although we are getting off topic here, is there a way to reset jep? I see the close of MainInterpreter which interrupts the dedicated thread, but I could not really spot if this triggers further cleanup also on the C side. I found the pyembed_shutdown which is hooked up with the jni unload, but I don't know when exactly this is called and if the few lines are really getting you back to 0. Background is that in our ui we have an interactive python console and so far we have a reset interpreter action, which gets you back into a clean state. Can you say whether this is possible today, or would be possible at all with an embedded C Python?

In theory pyembed_shutdown would be called by the JVM. I don't remember the specifics, perhaps when the class loader is garbage collected? But when I have experimented with this in the past there is no way to trigger the JVM to unload the module so you cannot reset Jep. It may be possible with a plain python interpreter although I wouldn't be surprised if was not. I am nearly positive that extension modules such as numpy which track a bunch of static state in c code would not be able to be reset.

Mar 18 '21 22:03 bsteffensmeier

I ran into another complication in terms of threading, now with PyObjects. :) The issue is that I have a PyObject, which was obtained on another thread (to be specific a PyCallable). I now have the need to dispatch the execution onto another thread (e.g. main thread). In that context I only know the PyObject. What I would like to do now is to use that object on this other thread. I experimented with something similar to your suggested "onThisThread" interpreter API. I needed to hack a bit in java, as the API's for getting the python object pointer are not visible, but this is just for a quick test. ;) The idea is some pyObject.transfer(Jep jep) API, which allows to create a reference of the object in another jep context. The test below seems to work (for the moment), but I figured out that the reference is dead afterwards. My theory is that the created TransferPyObject is creating a new PyPointer, but not increasing the reference count, but decreasing it when being closed? What do you think about some API like this in general? Of course it needs some more thoughts and probably more interaction with the C side (e.g. detect and throw when attempting to transfer a stale PyPointer, which does not exist on the C side anymore).

	public static class TransferPyObject extends jep.python.PyObject {

		protected TransferPyObject(Jep jep, long pyObject) throws JepException {
			super(jep, pyObject);
		}

	}

	private jep.python.PyObject transfer(Jep jep, jep.python.PyObject pyObject) throws JepException {
		try {
			Method getPyObject = jep.python.PyObject.class.getDeclaredMethod("getPyObject");
			getPyObject.setAccessible(true);
			long pointer = (Long)getPyObject.invoke(pyObject);
			return new TransferPyObject(jep, pointer);
		} catch (ClassCastException | SecurityException | NoSuchMethodException | IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
			throw new JepException("Cannot transfer callable pointer", e);
		}
	}

	@Test
	public void testTransferObject() throws JepException, InterruptedException, ExecutionException {
		//using prototype with shared globals dictionary between multiple SharedInterpreters
		try (Jep jep = new SharedInterpreter()) {
			jep.eval(
					"def foo():\n" +
					"  return 1\n");
			PyObject fooMain = jep.getValue("foo", PyObject.class);
			ExecutorService executor = Executors.newSingleThreadExecutor();
			Future<Integer> fromOtherThread = executor.submit(() -> {
				try (Jep jep2 = new SharedInterpreter()) {
					return transfer(jep2, fooMain).as(PyCallable.class).callAs(Integer.class);
				} catch (JepException e) {
					fail();
					return 0;
				}
			});
			assertEquals(1, (int)fromOtherThread.get());
			assertEquals(1, (int)fooMain.as(PyCallable.class).callAs(Integer.class));
		}
		try (Jep jep = new SharedInterpreter()) {
			//SIGSEGV here -> probably python reference counting issue?
			jep.getValue("foo", PyObject.class);
		}
	}

Edit: I just tricked the transfered PyPointer with the debugger and avoided the dispose/decref, this makes the test not segfault and pass.

Mar 19 '21 20:03 jsnps

PyPointer assumes the reference had incref called before creation so it decrefs on destruction. If you are cloning it you would need to incref to avoid messing up the reference counting. There is currently no way to incref from java.

Youa re correct the biggest obstacle with your transfer API is ensuring that the pointer is still valid. As long as the original Jep isntance is still open then it should be legal but I haven't looked closely at that in awhile so I am not sure if tehre are other complexities.

I would like to change Jep so that the MemoryManager and associated PyPointers were associated with a PyInterpreterState* instead of a single jep Interpreter/PyThreadState*. Essentially I would like all SharedInterpreters to share PyPointers and allow them to be passed amongst eachother but SubInterpreters should probably not allow such fluid sharing. Technically I don't think there is a reason SubInterpreters couldn't share too but it somewhat violates the isolation SubInterpreters are supposed to have. This would be a fairly significant change to the PyPointer internals but in my opinion it would provide a very easy to use API.

Mar 21 '21 18:03 bsteffensmeier

Yes I also figured this out (incr ref) , so as some prototype I hacked it on the java side using ctypes .. I know, it's getting shaky ;). This at least got me further with this specific problem in our setup. Having a shared MemoryManager sounds like a good solution and probably also opens a door for multi-threaded access to an interpreter or at least simplifies some synchronization layer on top.

/**
 * Very hacky way to move a PyPointer into another jep instance.
 * TODO implementation in C or alternative (e.g. shared memory manager in jep)
 */
public class TransferPyObject extends PyObject {

	protected TransferPyObject(Jep jep, long pyObject) throws JepException {
		super(jep, pyObject);
	}

	public static PyObject transfer(Jep jep, PyObject pyObject) throws JepException {
		try {
			//TODO ensure PyPointer not disposed yet
			Method getPyObject = PyObject.class.getDeclaredMethod("getPyObject");
			getPyObject.setAccessible(true);
			long pointer = (Long)getPyObject.invoke(pyObject);
			//increase reference count in python
			jep.exec(
					"if not hasattr(jep, '_incref'):\n" +
					"  import ctypes\n" +
					"  jep._incref = ctypes.pythonapi.Py_IncRef\n" +
					"  jep._incref.argtypes = [ctypes.py_object]\n" + 
					"  jep._incref.restype = None\n"
					);
			PyCallable incref = jep.getValue("jep._incref", PyCallable.class);
			incref.call(pyObject);
			
			return new TransferPyObject(jep, pointer);
		} catch (ClassCastException | SecurityException | NoSuchMethodException | IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
			throw new JepException("Cannot transfer PyPointer", e);
		}
	}

}

Mar 21 '21 20:03 jsnps

jep jep copied to clipboard

Reentering Java from different Python thread

jep
jep copied to clipboard