Segfault when calling python function repeatedly
my first issue go easy
Affects: PythonCall
Desribe: I'm seeing segfaults after running for a while. I've tried to create a MWE, it usually segfaults first go, but sometimes I have to re-run the for-loop a few times. It feels like a gc race condition like i've seen in the other issues, but I have no real proof other than the more stuff i have going on, the faster it segfaults
using PythonCall
f1 = @pyexec """
def pyfunc(params):
return 0
""" => pyfunc
params = Dict{String,Float64}(
"A" => 1,
"B" => 2,
"C" => 3,
)
for i in 1:10_000_000
f1(params)
end
[965155] signal 11 (1): Segmentation fault
in expression starting at /home/olssoj2/git/julia_local/scratch_hy_mwe.jl:18
unknown function (ip: 0x7f30d83b03ae)
unknown function (ip: 0x7f30d82b1187)
Py_DecRef at /home/olssoj2/.julia/packages/PythonCall/L4cjh/src/C/pointers.jl:303 [inlined]
pydel! at /home/olssoj2/.julia/packages/PythonCall/L4cjh/src/Core/Py.jl:114
#pycall#21 at /home/olssoj2/.julia/packages/PythonCall/L4cjh/src/Core/builtins.jl:244
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:831
#_#11 at /home/olssoj2/.julia/packages/PythonCall/L4cjh/src/Core/Py.jl:357
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:831
Py at /home/olssoj2/.julia/packages/PythonCall/L4cjh/src/Core/Py.jl:357
top-level scope at /home/olssoj2/git/julia_local/scratch_hy_mwe.jl:19
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
Julia Version 1.11.5 Commit 760b2e5b739 (2025-04-14 06:53 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 160 × Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, icelake-server) Threads: 1 default, 0 interactive, 1 GC (on 160 virtual cores) Environment: JULIA_NUM_THREADS = JULIA_VSCODE_REPL = 1 JULIA_EDITOR = code LD_LIBRARY_PATH = /opt/rh/gcc-toolset-14/root/usr/lib64:/opt/rh/gcc-toolset-14/root/usr/lib
(@v1.11) pkg> status PythonCall
Status ~/.julia/environments/v1.11/Project.toml
[6099a3de] PythonCall v0.9.25
if I trace the call stack
...
elseif !isempty(args)
args_ = pytuple_fromiter(args)
ans = pycallargs(f, args_)
pydel!(args_) <------ this is builtins.jl:244
ans
else
... which calls pydel! below
ptr = getptr(x)
if ptr != C.PyNULL
C.Py_DecRef(ptr) <---- this is Py.jl:144 leads to segfault
setptr!(x, C.PyNULL)
end
push!(PYNULL_CACHE, x)
return
end
pydel! has comments with the word "DANGER!" and is described as including an optimization to reuse Py objects.
- I'll try to reason through this ref-counting but yeah it is not trivial
- Does anyone know if there is a band-aid, maybe which disables optimizations, but would ensure validity
if I run in gdb it shows
Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00007fff9f0411f8 in subtype_dealloc () from /usr/lib64/libpython3.11.so.1.0
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-225.el8.x86_64 libffi-3.1-24.el8.x86_64 mpdecimal-2.5.1-3.el8.x86_64 python3.11-libs-3.11.11-1.el8_10.x86_64
(gdb) bt
#0 0x00007fff9f0411f8 in subtype_dealloc () from /usr/lib64/libpython3.11.so.1.0
#1 0x00007fff9efea45a in tupledealloc () from /usr/lib64/libpython3.11.so.1.0
#2 0x00007ffff5f0f034 in ?? ()
#3 0x00007fffa5afb9a0 in jl_system_image_data () from /home/wsl2user/.julia/compiled/v1.11/PythonCall/WdXsa_pPI1g.so
#4 0x00007fffffffca70 in ?? ()
#5 0x00007fffa64eb130 in ?? () from /home/wsl2user/.julia/compiled/v1.11/PythonCall/WdXsa_pPI1g.so
#6 0x00007fffea9fc080 in ?? ()
#7 0x00007fffffffca60 in ?? ()
#8 0x00007ffff6060a8b in jl_f_finalizer (F=
does that mean the python object was decref'd too many times?
A Solution
pd = pydict(params)
for i in 1:10_000_000
rp1(pd)
end
the first version above wraps the julia dict in python as a juliacall.DictValue, while this version converts the julia type to a python dict first. ~~fwiw I can't get this latter one to crash~~ this leads to a memory leak
Ok new approach. We were making a bazillion pytuples and DictValues every time we called the function. Instead we can just cache these. Maybe this is the short-term (and possibly long-term) solution -- update raw array values but don't make a lot of intermediate structures. pd = pydict(params) pt = pytuple((pd,))
for i in 1:10_000_000 PythonCall.Core.pycallargs(rp1, pt) end
Thanks for the PR. I believe the issue was just fixed by #618 which is now merged. Your reproducer crashes for me before the PR but doesn't after. Can you verify the issue is fixed for you please? You can install the dev version of PythonCall like
pkg> add PythonCall#main
bravo.
do you have a lot more going on than your earlier less-getptr branch? the first thing i did was try it (96bd22 or 2c2c95a "double-check the cache", not sure which). but this today seems solid af. if you're at juliacon i'll buy all the beers you want
I added a few Base.GC.@preserves around places where we were still using getptr in an unsafe way.
Not at JuliaCon I'm afraid, wrong continent!
Glad it's working for you.