juliacall: Julia GC triggered from a python thread causes hangs
Affects: JuliaCall
Describe the bug
I am seeing Julia tasks launched from parallel python threads lead to sporadic hangs. Short-running tasks suffer less from this. I had no luck using the tricks from #539 or here. I think the hangs are triggered by Julia GC - disabling GC with Base.GC.enable(false) beforehand seems to fix the hangs - and triggering Julia GC manually from a python thread immediately causes a hang.
This is all with PYTHON_JULIACALL_HANDLE_SIGNALS set to yes, in case that matters.
from juliacall import Main as jl
jl_gc = jl.Base.GC.gc
import threading
t = threading.Thread(target=jl_gc)
t.start() # hangs every time (100% CPU)
Terminating python gives me
[89363] signal (15): Terminated: 15
in expression starting at none:0
jl_gc_wait_for_the_world at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/src/gc.c:241 [inlined]
ijl_gc_collect at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/src/gc.c:3515
gc at ./gcutils.jl:129 [inlined]
gc at ./gcutils.jl:129 [inlined]
pyjlany_call at /Users/username/.julia/packages/PythonCall/Nr75f/src/JlWrap/any.jl:45
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/src/gf.c:3077
(pyproj) c889f3b4c8d7:pyproj username$ _pyjl_callmethod at /Users/username/.julia/packages/PythonCall/Nr75f/src/JlWrap/base.jl:73
_pyjl_callmethod at /Users/username/.julia/packages/PythonCall/Nr75f/src/JlWrap/C.jl:63
jfptr__pyjl_callmethod_9032 at /Users/username/.julia/compiled/v1.10/PythonCall/WdXsa_v1O0R.dylib (unknown line)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jlcapi__pyjl_callmethod_9116 at /Users/username/.julia/compiled/v1.10/PythonCall/WdXsa_v1O0R.dylib (unknown line)
method_vectorcall_VARARGS at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
PyObject_Vectorcall at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
_PyEval_EvalFrameDefault at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
_PyObject_FastCallDictTstate at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
slot_tp_call at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
_PyObject_Call at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
_PyEval_EvalFrameDefault at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
method_vectorcall at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
thread_run at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
pythread_wrapper at /Users/username/miniforge3/envs/pyproj/bin/python3.12 (unknown line)
_pthread_start at /usr/lib/system/libsystem_pthread.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
Allocations: 3689147 (Pool: 3684864; Big: 4283); GC: 6
Your system
- MacOS, Apple Silicon, Julia 1.10.5
- Linux, x86_64, Julia 1.10.6
- PythonCall/juliacall 0.9.23
While it may be suboptimal for many workloads, I ended up working around this by having the main Python thread do all Julia calls. After any thread calls .map on the thread pool executor, the main thread waits for Julia function closures from the worker threads, executes them, and sends the results back. Since I'm making extensive use of scheduling sensitive threading on the Julia side, it was fine to only allow one call into Julia from Python for optimal cache behavior anyway. I'm calling Julia for performance reasons, so I've grouped as much work into a single call as I possibly can, two extra thread context switches per call is a small price to pay compared to how long the Julia routines take to execute.
It would be great if we could get the GC to play nice here. Maybe a safe abstraction could be created where we can have multiple Python threads call into Julia while still providing a guarantee that the GC will actually be able to run at some point before memory becomes a problem. My solution to the problem is much more straightforward than needing to micromanage Julia's GC, but of course it's hardly optimal either.
Yep I've seen what I think is the same thing before.
AFAIU it's not really supported to call into Julia from a thread that wasn't launched by Julia (e.g. one created by Python). In experiments, what actually happens is that you CAN call into Julia ok, but when it yields, it will go to some other task and never return to this thread. And therefore the thread hangs.
There's a function to get Julia to "adopt" an external thread but when I tried it it caused segfaults or something.
So yeah for now just multi thread on the Julia side.
Yep I've seen what I think is the same thing before.
AFAIU it's not really supported to call into Julia from a thread that wasn't launched by Julia (e.g. one created by Python). In experiments, what actually happens is that you CAN call into Julia ok, but when it yields, it will go to some other task and never return to this thread. And therefore the thread hangs.
There's a function to get Julia to "adopt" an external thread but when I tried it it caused segfaults or something.
So yeah for now just multi thread on the Julia side.
Sounds like that might be why the jl_yield() workaround didn't work for me. It might become a bit more of an issue down the road when GIL removal is no longer experimental, but for now its more of a minor annoyance. Outside of this JuliaCall feels rock solid, it was the deciding factor to go ahead with using Julia over C++ and Fortran for the Python SaMD project I'm working on. If I get some spare time I will take a look at messing around with jl_adopt_thread among other things.