PythonCall.jl icon indicating copy to clipboard operation
PythonCall.jl copied to clipboard

Intermittent CI failures: Fatal Python error: _Py_CheckRecursiveCall: Unrecoverable stack overflow

Open cjdoris opened this issue 2 months ago • 16 comments

We have started to see intermittent failures in CI like this one: https://github.com/JuliaPy/PythonCall.jl/actions/runs/18520695490/job/52779794536

Observed on:

  • Julia 1.9 and 1.12
  • Mac and Linux
  • Python 3.14

Suspiciously started soon after Python 3.14 was released, I suspect we have another ABI issue.

Here is the full error:

Fatal Python error: _Py_CheckRecursiveCall: Unrecoverable stack overflow (used -367886 kB) while calling a Python object
Python runtime state: initialized

Thread 0x0000000304bf7000 (most recent call first):
  <no Python frame>

Current thread 0x0000000200207200 (most recent call first):
  <no Python frame>

[5793] signal (6): Abort trap: 6
in expression starting at /Users/runner/work/PythonCall.jl/PythonCall.jl/test/GIL.jl:14
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
pthread_kill at /usr/lib/system/libsystem_pthread.dylib (unknown line)
abort at /usr/lib/system/libsystem_c.dylib (unknown line)
fatal_error_exit at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
fatal_error at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
_Py_FatalErrorFunc at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
_Py_CheckRecursiveCall at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
method_vectorcall_NOARGS at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
PyObject_VectorcallMethod at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
flush_std_files at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
fatal_error at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
_Py_FatalErrorFunc at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
_Py_CheckRecursiveCall at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
cfunction_vectorcall_FASTCALL_KEYWORDS at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
_PyObject_CallFunctionVa at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
PyObject_CallFunction at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
PyImport_Import at /private/var/folders/q0/wmf37v850txck86cpnvwm_zw0000gn/T/jl_k5lUcu/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib (unknown line)
PyImport_Import at /Users/runner/work/PythonCall.jl/PythonCall.jl/src/C/pointers.jl:300 [inlined]
macro expansion at /Users/runner/work/PythonCall.jl/PythonCall.jl/src/Core/Py.jl:118 [inlined]
pyimport at /Users/runner/work/PythonCall.jl/PythonCall.jl/src/Core/builtins.jl:1458
#3 at /Users/runner/work/PythonCall.jl/PythonCall.jl/test/GIL.jl:8 [inlined]
lock at /Users/runner/work/PythonCall.jl/PythonCall.jl/src/GIL/GIL.jl:39
macro expansion at /Users/runner/work/PythonCall.jl/PythonCall.jl/test/GIL.jl:7 [inlined]
#5257#threadsfor_fun#2 at ./threadingconstructs.jl:206
#5257#threadsfor_fun at ./threadingconstructs.jl:173 [inlined]
#1 at ./threadingconstructs.jl:145
unknown function (ip: 0x12092235f)
ijl_apply_generic at /Users/runner/hostedtoolcache/julia/1.9.4/x64/lib/julia/libjulia-internal.1.9.dylib (unknown line)
start_task at /Users/runner/hostedtoolcache/julia/1.9.4/x64/lib/julia/libjulia-internal.1.9.dylib (unknown line)
Allocations: 35353796 (Pool: 35322249; Big: 31547); GC: 53
Package PythonCall errored during testing (received signal: 6)

cjdoris avatar Oct 15 '25 15:10 cjdoris

I have a similar stack overflow error locally:

julia> import Pkg; Pkg.activate(temp = true); Pkg.add("PythonCall")
  Activating new project at `/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_IMwpEU`
   Resolving package versions...
    Updating `/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_IMwpEU/Project.toml`
  [6099a3de] + PythonCall v0.9.28
    Updating `/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_IMwpEU/Manifest.toml`
  [992eb4ea] + CondaPkg v0.2.33
  [9a962f9c] + DataAPI v1.16.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [692b3bcd] + JLLWrappers v1.7.1
  [0f8b85d8] + JSON3 v1.14.3
  [1914dd2f] + MacroTools v0.5.16
  [0b3b1443] + MicroMamba v0.1.14
  [bac558e1] + OrderedCollections v1.8.1
  [69de0a69] + Parsers v2.8.3
  [fa939f87] + Pidfile v1.3.0
⌅ [aea7be01] + PrecompileTools v1.2.1
  [21216c6a] + Preferences v1.5.0
  [6099a3de] + PythonCall v0.9.28
  [6c6a2e73] + Scratch v1.3.0
  [856f2bd8] + StructTypes v1.11.0
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.12.1
  [e17b2a0c] + UnsafePointers v1.0.0
⌅ [f8abcde7] + micromamba_jll v1.5.12+0
  [4d7b5844] + pixi_jll v0.41.3+0
  [0dad84c5] + ArgTools v1.1.1
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads v1.6.0
  [7b1f6079] + FileWatching
  [b77e0a4c] + InteractiveUtils
  [4af54fe1] + LazyArtifacts
  [b27032c2] + LibCURL v0.6.4
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions v1.2.0
  [44cfe95a] + Pkg v1.10.0
  [de0858da] + Printf
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA v0.7.0
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [fa267f1f] + TOML v1.0.3
  [a4e569a6] + Tar v1.10.0
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [deac9b47] + LibCURL_jll v8.4.0+0
  [e37daf67] + LibGit2_jll v1.6.4+0
  [29816b5a] + LibSSH2_jll v1.11.0+1
  [c8ffd9c3] + MbedTLS_jll v2.28.2+1
  [14a3606d] + MozillaCACerts_jll v2023.1.10
  [83775a58] + Zlib_jll v1.2.13+1
  [8e850ede] + nghttp2_jll v1.52.0+1
  [3f19e933] + p7zip_jll v17.4.0+2
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`

julia> using PythonCall

julia> pyint(5)
Fatal Python error: _Py_CheckRecursiveCall: Unrecoverable stack overflow (used 1624858 kB) while getting the repr of an obj
Python runtime state: initialized

Current thread 0x00000001f012c800 (most recent call first):
  <no Python frame>

[25488] signal (6): Abort trap: 6
in expression starting at none:0
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 7517176 (Pool: 7509483; Big: 7693); GC: 13
zsh: abort      julia

I am using macOS Tahoe 26.0.1.

Originally reported in https://github.com/jverzani/SymPyPythonCall.jl/issues/56

ranocha avatar Oct 23 '25 12:10 ranocha

Thank you. Is that the entire error message? It's not much of a backtrace, but that is typical for Mac I think. Can you give the output of CondaPkg.status() and PythonCall.C.CTX and versioninfo() please?

cjdoris avatar Oct 23 '25 15:10 cjdoris

Oh someone mentioned recently something about finding that doing anything multithreaded in PythonCall would crash if the thing being multithreaded wasn't compiled yet. I can't remember the details but maybe that's it. On Linux, these failures occur in test/GIL.jl on the first call to threaded_sleep() which is when it will be compiled. So maybe compiling it but only running on one thread would do the trick, or something like this.

cjdoris avatar Oct 23 '25 21:10 cjdoris

Yes, that's the entire stacktrace. Here are the other details:

julia> import Pkg; Pkg.activate(temp = true); Pkg.add("PythonCall") # output as before

julia> using CondaPkg

julia> CondaPkg.status()
CondaPkg Status /private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/CondaPkg.toml (empty)
Environment
  /private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/.CondaPkg/.pixi/envs/default

julia> PythonCall.C.CTX
IOContext{Base.TTY}:
  is_embedded = false
  is_initialized = true
  is_preinitialized = false
  lib_ptr = Ptr{Nothing} @0x0000000074b59c70
  exe_path = "/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/.CondaPkg/.pixi/envs/default/bin/python"
  lib_path = "/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/.CondaPkg/.pixi/envs/default/lib/libpython3.14.dylib"
  dlopen_flags = 0x00000046
  pyprogname = "/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/.CondaPkg/.pixi/envs/default/bin/python"
  pyprogname_w = Int32[47, 118, 97, 114, 47, 102, 111, 108, 100, 101  …  105, 110, 47, 112, 121, 116, 104, 111, 110, 0]
  pyhome = "/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/.CondaPkg/.pixi/envs/default:/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_ikxX8Y/.CondaPkg/.pixi/envs/default"
  pyhome_w = Int32[47, 118, 97, 114, 47, 102, 111, 108, 100, 101  …  115, 47, 100, 101, 102, 97, 117, 108, 116, 0]
  which = :CondaPkg
  version = v"3.14.0"

julia> versioninfo()
Julia Version 1.10.10
Commit 95f30e51f41 (2025-06-27 09:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 16 × Apple M4 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_PKG_PRESERVE_TIERED_INSTALLED = true

ranocha avatar Oct 24 '25 03:10 ranocha

Thanks. Can you try doing CondaPkg.add("python", version="3.13.*") before anything else to see if the error is only with Python 3.14?

cjdoris avatar Oct 24 '25 06:10 cjdoris

Great intuition! Indeed, everything seems to be fine with Python 3.13:

julia> import Pkg; Pkg.activate(temp = true); Pkg.add(["CondaPkg", "PythonCall"])
  Activating new project at `/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF`
   Resolving package versions...
    Updating `/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/Project.toml`
  [992eb4ea] + CondaPkg v0.2.33
  [6099a3de] + PythonCall v0.9.28
    Updating `/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/Manifest.toml`
  [992eb4ea] + CondaPkg v0.2.33
  [9a962f9c] + DataAPI v1.16.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [692b3bcd] + JLLWrappers v1.7.1
  [0f8b85d8] + JSON3 v1.14.3
  [1914dd2f] + MacroTools v0.5.16
  [0b3b1443] + MicroMamba v0.1.14
  [bac558e1] + OrderedCollections v1.8.1
  [69de0a69] + Parsers v2.8.3
  [fa939f87] + Pidfile v1.3.0
⌅ [aea7be01] + PrecompileTools v1.2.1
  [21216c6a] + Preferences v1.5.0
  [6099a3de] + PythonCall v0.9.28
  [6c6a2e73] + Scratch v1.3.0
  [856f2bd8] + StructTypes v1.11.0
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.12.1
  [e17b2a0c] + UnsafePointers v1.0.0
⌅ [f8abcde7] + micromamba_jll v1.5.12+0
  [4d7b5844] + pixi_jll v0.41.3+0
  [0dad84c5] + ArgTools v1.1.1
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads v1.6.0
  [7b1f6079] + FileWatching
  [b77e0a4c] + InteractiveUtils
  [4af54fe1] + LazyArtifacts
  [b27032c2] + LibCURL v0.6.4
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions v1.2.0
  [44cfe95a] + Pkg v1.10.0
  [de0858da] + Printf
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA v0.7.0
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [fa267f1f] + TOML v1.0.3
  [a4e569a6] + Tar v1.10.0
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [deac9b47] + LibCURL_jll v8.4.0+0
  [e37daf67] + LibGit2_jll v1.6.4+0
  [29816b5a] + LibSSH2_jll v1.11.0+1
  [c8ffd9c3] + MbedTLS_jll v2.28.2+1
  [14a3606d] + MozillaCACerts_jll v2023.1.10
  [83775a58] + Zlib_jll v1.2.13+1
  [8e850ede] + nghttp2_jll v1.52.0+1
  [3f19e933] + p7zip_jll v17.4.0+2
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`

julia> using CondaPkg; CondaPkg.add("python", version="3.13.*")
    CondaPkg Found dependencies: /private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/CondaPkg.toml
    CondaPkg Found dependencies: ~/.julia/packages/PythonCall/mkWc2/CondaPkg.toml
    CondaPkg Found dependencies: ~/.julia/packages/CondaPkg/0UqYV/CondaPkg.toml
    CondaPkg Resolving changes
             + python
    CondaPkg Initialising pixi
             │ /Users/hendrik/.julia/artifacts/d2fecc2a9fa3eac2108d3e4d9d155e6ff5dfd0b2/bin/pixi
             │ init
             │ --format pixi
             └ /var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg
✔ Created /private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/pixi.toml
    CondaPkg Wrote /var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/pixi.toml
             │ 
             │ [dependencies.python]
             │ channel = "conda-forge"
             │ build = "*cp*"
             │ version = "3.13.*, >=3.9,<4"
             │ [project]
             │ name = ".CondaPkg"
             │ platforms = ["osx-arm64"]
             │ channels = ["conda-forge"]
             │ channel-priority = "strict"
             └ description = "automatically generated by CondaPkg.jl"
    CondaPkg Installing packages
             │ /Users/hendrik/.julia/artifacts/d2fecc2a9fa3eac2108d3e4d9d155e6ff5dfd0b2/bin/pixi
             │ install
             └ --manifest-path /var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/pixi.toml
✔ The default environment has been installed.

julia> CondaPkg.status()
CondaPkg Status /private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/CondaPkg.toml
Environment
  /private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/.pixi/envs/default
Packages
  python v3.13.9 (3.13.*)

julia> using PythonCall

julia> PythonCall.C.CTX
IOContext{Base.TTY}:
  is_embedded = false
  is_initialized = true
  is_preinitialized = false
  lib_ptr = Ptr{Nothing} @0x0000000074b59c80
  exe_path = "/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/.pixi/envs/default/bin/python"
  lib_path = "/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/.pixi/envs/default/lib/libpython3.13.dylib"
  dlopen_flags = 0x00000046
  pyprogname = "/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/.pixi/envs/default/bin/python"
  pyprogname_w = Int32[47, 112, 114, 105, 118, 97, 116, 101, 47, 118  …  105, 110, 47, 112, 121, 116, 104, 111, 110, 0]
  pyhome = "/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/.pixi/envs/default:/private/var/folders/wq/qt33k03x1772wddsplgs03s40000gn/T/jl_v0YmFF/.CondaPkg/.pixi/envs/default"
  pyhome_w = Int32[47, 112, 114, 105, 118, 97, 116, 101, 47, 118  …  115, 47, 100, 101, 102, 97, 117, 108, 116, 0]
  which = :CondaPkg
  version = v"3.13.9"

julia> pyint(3)
Python: 3

ranocha avatar Oct 24 '25 07:10 ranocha

Great thanks. And just to check - with python 3.14 does the error you showed happen every time?

cjdoris avatar Oct 24 '25 14:10 cjdoris

Yes - it happens every time I execute code like in the examples above.

ranocha avatar Oct 24 '25 19:10 ranocha

My guess is we're accidentally relying on some CPython internals that have changed in Python 3.14 and not the Stable ABI.

I think most of extras.jl can now be removed as the corresponding functions are now in the Stable ABI so we can put them in pointers.jl instead. This might be the source of the problem.

cjdoris avatar Oct 25 '25 11:10 cjdoris

Ah it appears to be to do with a change in how the recursion limit check is performed in Python 3.14. The following post seems highly relevant: https://discuss.python.org/t/fatal-python-error-py-checkrecursivecall-unrecoverable-stack-overflow-used-406047-kb-while-calling-a-python-object/104183/2

As with that post, perhaps Julia tasks have their own stack and are fooling the recursion check. Not sure how to work around this though.

cjdoris avatar Oct 25 '25 21:10 cjdoris

Perhaps we need some Py_EnterRecursiveCall and Py_LeaveRecursiveCall somewhere.

cjdoris avatar Oct 25 '25 21:10 cjdoris

See https://discuss.python.org/t/python-3-14-0-is-incompatible-with-stack-switching-systems-what-do-we-do/104880 for an explanation of what's going on.

cjdoris avatar Nov 14 '25 21:11 cjdoris

Until the picture becomes clearer, I've restricted CondaPkg.toml to Python <3.14.

cjdoris avatar Nov 14 '25 22:11 cjdoris

Good news! I think this will most likely be fixed by https://github.com/python/cpython/pull/141944 which should be in the next patch release of Python 3.14.

cjdoris avatar Nov 28 '25 17:11 cjdoris

Good news! I think this will most likely be fixed by python/cpython#141944 which should be in the next patch release of Python 3.14.

Unfortunately python 3.14.1 comes with this issue :-( https://github.com/python/cpython/issues/142214

dpinol avatar Dec 03 '25 08:12 dpinol

Does this still occur with python 3.14.2?

dpinol avatar Dec 11 '25 16:12 dpinol

Let's see! #733

cjdoris avatar Dec 15 '25 16:12 cjdoris

Woohoo! We're back in business.

cjdoris avatar Dec 15 '25 21:12 cjdoris