galaxytools
galaxytools copied to clipboard
Deadlocks when testing ImageJ tools
A strange problem: when testing with planemo some imagej tools (https://github.com/bgruening/galaxytools/blob/master/tools/image_processing/imagej2/imagej2_create_image.xml for example), the test never ends.
When doing a ps
the ImageJ process is in T state (=stopped) and it won't resume.
Strangely, if running the tool_script.sh by hand from the workdir, it runs fine and gives the correct result.
I guess it could be some kind of buffering problem with multiple subprocess.Popen and ImageJ itself launching jython. But I couldn't find a way to fix this...
We hit this problem with Sylvain from @bgo-bioimagerie
@abretaud yeah, the Java-Python bridge has some problems here. It all magically works if you use a real job runner not the local one. E.g. in a Container. ping @gregvonkuster
@abretaud This is the line of code in the local job runner that is blocking the Java call: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/runners/local.py#L100. Commenting out the preexec_fn=os.setpgrp
parameter will unblock the local job runner for these tools.
@abretaud is this working for you now? I have put together a new version of the container here: https://github.com/bgruening/docker-galaxy-imaging
I think I still have the problem, unless you mean you changed something for the local job runner?
I just tried replacing preexec_fn=os.setpgrp
by preexec_fn=os.setsid
and it seems to fix the deadlock problem, and it should do the same job on the created process (by looking at the doc).
It feels a little scary though to touch this kind of code! But I can make a PR of course.
The new container version is cool thanks! Did you made some progress @bgo-bioimagerie? (and will you be at GCC by the way?)
@abretaud we decided to not PR this code as no one will use the local-runner in production. But it's hard to test :(
ok, I understand, perfectly reasonable to let this code as it is