use getptr a lot less
Defines an unsafe_convert rule for Py to C.PyPtr so we can pass Py directly to C functions.
Also defines incref(::Py) so we don't have to operate on pointers.
Generalises our faked C functions in extras.jl to also allow ::Py arguments in place of ::PyPtr.
Fixes #617 - and more generally removes a lot of places where the Julia GC could have freed a Py while we still have a pointer to the Python object, causing a read-after-free error.
Fixes some local segfaults I was getting. Nice!
What is left before merging? Let me know if I can help. Eager to see if this fixes some PySR segfaults people have been reporting.
What is left before merging?
Fix the segfault on Mac 😞 - I've not got a Mac to hand right now so debugging will be painful...
@cjdoris you should try https://github.com/mxschmitt/action-tmate. It's pretty useful for when you don't have the OS that is failing
I don't see the segfault btw. It just looks like a notebook error?
_________________________ pytest/test_nb.ipynb::Cell 0 _________________________
Notebook cell execution failed
Cell 0: Timeout of 2000 seconds exceeded while executing cell. Failed to interrupt kernel in 5 seconds, so failing without traceback.
Input:
# NBVAL_IGNORE_OUTPUT
import numpy as np
from juliacall import Main as jl
Yeah a more recent CI had the same issue. I suspect some bug is causing undefined behaviour which might cause segfaults or silently kill the kernel or ...
Oh goodie it's non-deterministic. The Mac test just passed but Linux failed! But Mac did fail with Python 3.13.2 (was previously testing on latest Python 3.13.3). At least I might be able to reproduce this locally on Linux...
I have completely reverted this branch (except for some trivial details in the CI workflow definition) and still get the errors, so these should be fixed first. The error seems to only occur on Mac on Python 3.13.
Argh Heisenbug! I add tmate to be able to observe it and it goes away.
Does anyone have a Mac they can debug this locally with? I just want to run the tests under gdb or something.
I have a mac but don't see the issue myself - just tested. Perhaps it was an upstream dependency which is now fixed?
Ah sorry let me try Python 3.13, i was using 3.12.
Actually it does look like the macos test is failing? So you could ssh in now: https://github.com/JuliaPy/PythonCall.jl/actions/runs/15477354381/job/43576415671?pr=618
I can reproduce the segfault: I see it on 3.13.0-3.13.2, but not on 3.13.3. So perhaps it was a Python bug that is now fixed?
OK the current failures are due to https://github.com/JuliaLang/julia/issues/58171 so I guess we have another reproducer for that.
I've run several long stress tests on linux with julia 1.11.5 & python 3.13.3/3.13.4, which used to consistently crash before this PR. With the PR, I just get a consistent crash when finishing the stress test in one of the tests, and only on an ubuntu 24.4, but not on an ubuntu 25.4 I tried with julia 1.12beta4, but I get a crash when running the core of my julia code (I need to further investigate):
setName at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:170 [inlined]
emit_pointerref at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/intrinsics.cpp:802 [inlined]
emit_intrinsic at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/intrinsics.cpp:1328
emit_call at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:5436
emit_expr at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:6379
emit_ssaval_assign at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:5866
emit_stmtpos at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:6168 [inlined]
emit_function at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:9348
jl_emit_code at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:9727
jl_emit_codeinst at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/codegen.cpp:9798
jl_emit_codeinst_to_jit_impl at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/jitlayers.cpp:761
jl_add_codeinst_to_jit at /cache/build/builder-amdci5-3/julialang/julia-release-1-dot-12/src/gf.c:2898
add_codeinsts_to_jit! at ./../usr/share/julia/Compiler/src/typeinfer.jl:1394
@vchuravy do you think the jl_object_id__cold might be coming from something fixable on the PythonCall.jl side, or is it definitely something in the Julia source itself?
@cjdoris do you think we could merge this? Even without fixing https://github.com/JuliaLang/julia/issues/58171 I think it's still useful to have these fixes.
In fact I do wonder if that issue isn't a Julia bug (since we've never seen it anywhere besides the PythonCall.jl instance). Maybe it's due to how Python multithreading changed in 1.13 or something...
do you think the jl_object_id__cold might be coming from something fixable on the PythonCall.jl side, or is it definitely something in the Julia source itself?
I would say it's a high likelyhood that it is a missunderstanding between Python, PythonCall.jl and Julia's memory semantics. So not at all clear that it is in the Julia source itself.
OK I undid the debugging changes and committed the original PR with a few extra GC.@preserves around. Tests passing now but that might just be random chance, we'll see.
In fact I do wonder if that issue isn't a Julia bug (since we've never seen it anywhere besides the PythonCall.jl instance). Maybe it's due to how Python multithreading changed in 1.13 or something...
What changed?
In fact I do wonder if that issue isn't a Julia bug (since we've never seen it anywhere besides the PythonCall.jl instance). Maybe it's due to how Python multithreading changed in 1.13 or something...
What changed?
I assumed there were some internal changes to enable the GIL free version: https://peps.python.org/pep-0703/. But actually I don't actually know if the regular GIL version had changes to enable this.