PythonCall.jl
PythonCall.jl copied to clipboard
segmentation fault with multi-threading
I have this MWE, where I get segmentation faults (frequently, but not deterministically), when trying to run some script that uses multi-threading on the Julia side.
I have used before launching ipython3
:
export JULIA_NUM_THREADS=4
(my computer has 4 cores - 8 threads).
The MWE is:
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from juliacall import Main as jl
In [2]: import numpy as np
In [3]: jl.seval("""
...: function test(x)
...: partial = zeros(Threads.nthreads())
...: Threads.@threads for i in 1:Threads.nthreads()
...: for j in i:Threads.nthreads():length(x)
...: partial[i] += x[j]
...: end
...: end
...: return sum(partial)
...: end
...: """)
Out[3]: test (generic function with 1 method)
In [4]: x = np.random.random((10_000,))
In [5]: %timeit jl.test(x)
72.2 µs ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %timeit jl.test(x)
Segmentation fault (core dumped)
Here I have emulated the error using the %timeit
macro from ipython
, but my actual error I get after some runs of a function of my package:
In [1]: from juliacall import Main as jl
In [2]: jl.seval("using CellListMap")
In [3]: import numpy as np
In [5]: x = np.random.random((10_000,3))
In [6]: nb = jl.neighborlist(x.transpose(), 0.05)
In [7]: nb = jl.neighborlist(x.transpose(), 0.05)
In [8]: nb = jl.neighborlist(x.transpose(), 0.05)
In [9]: nb = jl.neighborlist(x.transpose(), 0.05)
In [10]: nb = jl.neighborlist(x.transpose(), 0.05)
In [11]: nb = jl.neighborlist(x.transpose(), 0.05)
In [12]: nb = jl.neighborlist(x.transpose(), 0.05)
In [13]: nb = jl.neighborlist(x.transpose(), 0.05)
In [14]: nb = jl.neighborlist(x.transpose(), 0.05)
In [15]: nb = jl.neighborlist(x.transpose(), 0.05)
In [16]: nb = jl.neighborlist(x.transpose(), 0.05)
In [17]: nb = jl.neighborlist(x.transpose(), 0.05)
In [18]: nb = jl.neighborlist(x.transpose(), 0.05)
In [19]: nb = jl.neighborlist(x.transpose(), 0.05)
Segmentation fault (core dumped)
%timeit
runs the function multiple times, there seems to be some memory corruption, or memory overflow, causing the error.
Anyway, even if you have only some hint on how to debug this, I will be very thankful.
(even in the simplest example above, the segfaults only occur with multi-threading).
This is probably the same as #201 and #202.
The Python interpreter must only be used on thread 1. If you use Julia's multithreading, then even if it doesn't explicitly use Python, at some point the GC may free a Python object. If this GC runs on a thread other than 1, Python will crash.
The solution is to guard multithreaded Julia code with calls to PythonCall.GC.disable()
and PythonCall.GC.enable()
, so the Julia code in your MWE becomes:
function test(x)
partial = zeros(Threads.nthreads())
PythonCall.GC.disable()
try
Threads.@threads for i in 1:Threads.nthreads()
for j in i:Threads.nthreads():length(x)
partial[i] += x[j]
end
end
finally
PythonCall.GC.enable()
end
return sum(partial)
end
This ensures that no Python objects are freed during the multithreaded portion: if Julia's GC collects any Python objects, they will be cached and freed at PythonCall.GC.enable()
.
Not sure if I did something wrong, but I still get the segfaults:
In [1]: from juliacall import Main as jl
In [2]: jl.seval("""
...: function test(x)
...: partial = zeros(Threads.nthreads())
...: PythonCall.GC.disable()
...: try
...: Threads.@threads for i in 1:Threads.nthreads()
...: for j in i:Threads.nthreads():length(x)
...: partial[i] += x[j]
...: end
...: end
...: finally
...: PythonCall.GC.enable()
...: end
...: return sum(partial)
...: end
...: """)
Out[2]: test (generic function with 1 method)
In [3]: import numpy as np
In [4]: x = np.random.random((10_000,));
In [6]: %timeit jl.test(x)
60.4 µs ± 30.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit jl.test(x)
Segmentation fault (core dumped)
(the same for my package)
In that case I don't know what's going on. I can reproduce the error but don't get a core dump. Can you poke around the core dump with gdb and see what's going on? Maybe get a backtrace?
If I do jl.GC.enable(False)
before %timeit
then the problem stops.
The PythonCall GC code is itself not thread safe, but I assumed that that wasn't an issue since Julia's GC can only run from one thread at a time, so I assumed that finalizers only ran in one thread at a time. Maybe that's not the case. I think with a little more digging I can figure this out.
I'm flummoxed!
Here's a slightly smaller MWE:
In [1]: import os
In [2]: os.environ['PYTHON_JULIACALL_THREADS'] = '4'
In [3]: from juliacall import Main as jl
In [4]: jl.seval("""
...: function test()
...: partial = zeros(Threads.nthreads())
...: x = zeros(10_000)
...: Threads.@threads for i in 1:Threads.nthreads()
...: for j in i:Threads.nthreads():length(x)
...: partial[i] += x[j]
...: end
...: end
...: return sum(partial)
...: end
...: """)
Out[4]: test (generic function with 1 method)
In [5]: %timeit jl.test()
The slowest run took 10.87 times longer than the fastest. This could mean that an intermediate result is being cached.
150 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %timeit jl.test()
Segmentation fault
Things tried:
- Deleting all the finalizers/GC stuff from PythonCall, so when a Python object is done with, it is never freed. Segfault still occurs.
- Running with 1 thread (either by setting the number of threads to 1 or removing
@threads
) fixes it. - Calling
jl.GC.enable(False)
before%timeit ...
fixes it. - Doing
%timeit jl.seval("test()")
instead fixes it. Weird!
So it looks like some combination of garbage collection and threading are the issue. But I've turned off the finalizers in PythonCall so I don't know what else could be the issue.
If I am using this approach to disable/enable GC I am running out of memory at some point. Would it help if I make a minimal example that reproduces this behaviour?
It's not surprising that disabling GC entirely means you run out of memory. You could maybe do a GC.collect()
periodically (outside of any threaded code).
Hmm, but I am enabling GC again after the threaded call. I cant find a GC.collect
function. I tried GC.gc
(outside threaded calls) but that results in a segfault. Thank you for your help!:)
As said on discourse: https://discourse.julialang.org/t/floop-threading-and-juliacall-produce-segmentation-fault/100717/11?u=vchuravy
Julia intentionally causes segmentation faults as part of the GC safepoint mechanism. This MPI.jl issue is somewhat related https://github.com/JuliaParallel/MPI.jl/issues/725
PythonCall starts Julia with --handle-signals=no
https://github.com/cjdoris/PythonCall.jl/blob/e374e2503b7b843c457d30596a2428ebfc26bb0b/pysrc/juliacall/init.py#L132
Which then causes benign segmentation faults to become spurios ones.
To be clear, you're suggesting to change that option to "yes" (or just remove it)? According to git blame, it was introduced last year in https://github.com/cjdoris/PythonCall.jl/commit/bef6afa3e4146736eda0b9009eeac95cae39bf04 without much explanation (just "better compatibility"), it's unclear what was the motivation for that change.
Yes, changing it from no to yes will at least no longer cause faults due to GC (and it looks like most of the recorded issues here worked around it by running with GC off.
This issue was opened shortly after that change aswell.
I see that the offending commit is part of the branch https://github.com/cjdoris/PythonCall.jl/commits/gc referenced in issues #201 and #202 which was meant to deal with other segmentation faults in the GC in multithreading programs, but it sounds like this specific change just worsened the situation. Is that an accurate reconstruction of the events @cjdoris?
Hangs could come about from one thread wanting to stop the world and another executing Python code? The code would need to manually transition to GC safe before calling into Python.
@vchuravy thanks for the tip here! Just wanted to post that this works perfectly for me, after setting CONFIG['opt_handle_signals'] = 'yes'
, with the caveat that we still need to wrap threaded code in PythonCall.GC.disable() / PythonCall.GC.enable()
. Rather than segfaulting, it now hangs at this line if PythonCall.GC is not disabled in some of our threaded code.
This is significantly better than the workaround i posted here, as we don't have to worry about huge memory allocation issues in the threaded code or temporarily calling Base.GC.gc
on the main thread. Let's wait to hear what the maintainer thinks, at the very least if this could be set by an env var and default to no to retain backwards compatibility, we could drop a lot of hack code and get a big performance benefit
Thank you for looking into this. I had no idea there was such a thing as a benign segfault! Given it is so important for Julia's GC I guess we should turn Julia's signal handling back on.
I'm not sure what that terrible "better compatibility" comment was about. I suspect that it's because Python also wants to handle signals, and letting them both do it interferes with each other. For example, I think allowing Julia's signal handler prevented you from doing keyboard interrupts (Ctrl-C) in Python code.
For example, I think allowing Julia's signal handler prevented you from doing keyboard interrupts (Ctrl-C) in Python code.
@cjdoris in my experience, allowing Julia's signal handler allows keyboard interrupts (Ctrl-C) to work in Python code. Using branch from #333 :
PYTHON_JULIACALL_HANDLE_SIGNALS=no python
Python 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:07:22) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from juliacall import Main as jl
>>> jl.seval("sleep(10)")
^C>>>
^C
above corresponds to my keyboard interrupt. You can see it just continues on.
PYTHON_JULIACALL_HANDLE_SIGNALS=yes python
Python 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:07:22) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from juliacall import Main as jl
>>> jl.seval("sleep(10)")
^C
[73719] signal (2): Interrupt: 2
in expression starting at none:1
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
uv__io_poll at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
uv_run at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
ijl_task_get_next at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
poptask at ./task.jl:974
wait at ./task.jl:983
task_done_hook at ./task.jl:672
jfptr_task_done_hook_33190.clone_1 at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/sys.dylib (unknown line)
ijl_apply_generic at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_finish_task at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
start_task at /Users/brian/.julia/juliaup/julia-1.9.2+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
unknown function (ip: 0x0)
Allocations: 1974217 (Pool: 1972735; Big: 1482); GC: 3
This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days.
I think it is still relevant. The workaround of disabling GC in the internal Julia code can cause memory build up, making threading unusable in some cases.
I don't know if this can be solved by modifications of this package alone, that's another question.
So there are a couple of interactions here:
- Julia uses signals internally for multiple purposes, starting Julia with signal handlers off and multi-threading on is an unsupported configuration (I am working on a change that will throw an error if this is detecteed)
- When python code is executing and another thread wants to run GC we may appear to hang. In order to avoid this before calling into python the thread must be marked as GC safe and unmarked when returning to Julia.
If you want the python signal handlers to work as well then someone would need to implement signal chaining which might be a bit fiddly but shouldn't be to complicated. I briefly chatted with @vtjnash about this a while ago.
I think this was largely fixed (well, worked-around) by setting the env var PYTHON_JULIACALL_HANDLE_SIGNALS=yes
? This installs the Julia signal handler, so that GC works as expected. The downside is that you don't get the Python signal handler, which breaks some things like Ctrl-C being caught and turned into KeyboardInterrupt
.
@vchuravy signal chaining would be cool - presumably this requires some level of co-operation by either the core Julia or Python implementation?
@cjdoris could PYTHON_JULIACALL_HANDLE_SIGNALS=yes
be used as default? Tons of Julia code is multithreaded under-the-hood so this sort of thing will inevitably be triggered at some point. The user current experience is to get a bus error
without further explanation which is not great.
Alternatively is there any way one could guarantee that all objects passed to a Julia call are copied and no longer accessible to Python? Sort of like how Rust guarantees thread safety. I think allowing mutation by default is going to be dangerous for my application so would like a way to force it to copy objects when passed one way or the other. (I'm not even sure jl.deepcopy()
on arguments would work here, but maybe it would?)
@vchuravy what is the ETA of signal chaining and who should I poke to help get this working?
I don't know that anyone is working on signal chaining.
The reason why PYTHON_JULIACALL_HANDLE_SIGNALS=no
is the default is because setting it to yes
messes with ordinary Python behaviour. For example, Ctrl-C no longer raises a KeyboardInterrupt but just crashes Python immediately.
However what would probably be a good idea is to raise a warning if juliacall is being started with multiple threads and signal handling turned off, suggesting the user may want to enable julia signal handling.
Maybe my solution in https://github.com/JuliaPy/PythonCall.jl/issues/201#issuecomment-2164043075 could help you?