Bug report

Bug description:

In the context of EasyBuild, I'm trying to build Python 3.14.2 for a new toolchain based on GCC 15.2.0 (see this PR https://github.com/easybuilders/easybuild-easyconfigs/pull/25006). Building Python worked fine on a few machines I've tested, all x86-64 based, with various distributions. However, builds consistently failed on two particular systems with similar hardware.

Looking into build failures, I was getting the following output:

0:00:20 load avg: 1.00 [18/43] test_functools
make: *** [Makefile:1020: profile-run-stamp] Killed

Checking the build command, test_functools is hanging indefinitely until seemingly killed by OOM. Checking closer with GDB, I'm getting this stack trace.

#79 0x000040000161fa1c in save_reduce (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, args=<optimized out>, obj=obj@entry=0x40002c6fbfa0) at ./Modules/_pickle.c:4273
#80 0x0000400001619154 in save (st=0x40000153db10, self=0x40002cb0c720, obj=0x40002c6fbfa0, pers_save=<optimized out>) at ./Modules/_pickle.c:4555
#81 0x0000400001616898 in store_tuple_elements (state=0x40000153db10, self=0x40002cb0c720, t=0x4000732a9240, len=1) at ./Modules/_pickle.c:2792
#82 0x000040000161a914 in save_tuple (state=0x40000153db10, self=0x40002cb0c720, obj=0x4000732a9240) at ./Modules/_pickle.c:2872
#83 save (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, obj=0x4000732a9240, pers_save=pers_save@entry=0) at ./Modules/_pickle.c:4434
#84 0x000040000161fa1c in save_reduce (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, args=<optimized out>, obj=obj@entry=0x40002c6fbfa0) at ./Modules/_pickle.c:4273
#85 0x0000400001619154 in save (st=0x40000153db10, self=0x40002cb0c720, obj=0x40002c6fbfa0, pers_save=<optimized out>) at ./Modules/_pickle.c:4555
#86 0x0000400001616898 in store_tuple_elements (state=0x40000153db10, self=0x40002cb0c720, t=0x4000732a9200, len=1) at ./Modules/_pickle.c:2792
#87 0x000040000161a914 in save_tuple (state=0x40000153db10, self=0x40002cb0c720, obj=0x4000732a9200) at ./Modules/_pickle.c:2872
#88 save (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, obj=0x4000732a9200, pers_save=pers_save@entry=0) at ./Modules/_pickle.c:4434

Checking with dmesg, the process was indeed killed with OOM. Having more than 400 GB of system memory, I would assume that this is sufficient to build Python

[18354.456973] [3169487]  9049 3169487     1340      616   458752        0             0 python3
[18354.465830] [3192788]  9049 3192788      216        0   458752        0             0 make
[18354.474416] [3206584]  9049 3206584  8964670  8952086 72220672        0             0 python
[18354.483181] [3207839]     0 3207839      145        0   393216        0             0 mmccrmonitor
[18354.492479] [3207840]     0 3207840      579      198   393216        0             0 psid
[18354.501065] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nvidia-dcgm.service,mems_allowed=0-1,global_oom,task_memcg=/slurm/user-reuter1/job-14341053/step-0/tasks,task=python,pid=3206584,uid=9049
[18354.520291] Out of memory: Killed process 3206584 (python) total-vm:573738880kB, anon-rss:572924288kB, file-rss:0kB, shmem-rss:9216kB, UID:9049 pgtables:70528kB oom_score_adj:0

I've tried reducing the number at test_recursive_pickle yielding a, probably expected, test failure.

https://github.com/python/cpython/blob/df793163d5821791d4e7caf88885a2c11a107986/Lib/test/test_functools.py#L449

The build then fails later on in test_json for the same reason.

#18 0x0000400001dbd5ec in encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=0x40002c4f47d0, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1549
#19 0x0000400001dbda68 in encoder_listencode_list (s=0x40002cbabdc0, writer=0x40002c4422f0, seq=0x40013ab14680, indent_level=0, indent_cache=0x0) at ./Modules/_json.c:1805
#20 encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=obj@entry=0x40013ab14680, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1519
#21 0x0000400001dbd70c in encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=0x40002c4f47d0, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1560
#22 0x0000400001dbda68 in encoder_listencode_list (s=0x40002cbabdc0, writer=0x40002c4422f0, seq=0x40013ab14640, indent_level=0, indent_cache=0x0) at ./Modules/_json.c:1805
#23 encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=obj@entry=0x40013ab14640, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1519

It's worth noting that builds on other platforms (Arch Linux, Fedora, Ubuntu 24) with x86 all worked out fine.

For testing, I've then tried to use GCC 14.3.0, yielding the same results. Trying Python 3.13.5, the build passed. Python 3.14.1, 3.14.2 and 3.15.0a3 failed with the issues mentioned above. Dependencies of Python were different between the two GCC versions, but still yielded the same result.

For builds, the following flags were used:

./configure --prefix=/tmp/software/Python/3.14.2-GCCcore-14.3.0  --build=aarch64-unknown-linux-gnu  --host=aarch64-unknown-linux-gnu  --enable-shared  --with-lto  --enable-optimizations  --with-ensurepip=upgrade

I've also tried using an external expat, with no noticeable differences.

Hardware information:

Linux Rocky Linux 9.6, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, NVIDIA driver 580.95.05, Python 3.9.21
Linux RHEL 9.6, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, 570.133.20, Python 3.9.21

Unfortunately, I'm a bit stuck here. Looking through existing issues, I've found https://github.com/python/cpython/issues/113655, though this issue is related to Stack Overflows and Windows. I've also found PRs like https://github.com/python/cpython/pull/124264, but Python 3.14 removed these values altogether (https://github.com/python/cpython/pull/133080). I'm not sure if the issues I'm seeing are related.

Happy to provide more information, if needed.

CPython versions tested on:

3.15, 3.14, 3.13

Operating systems tested on:

Linux

Jan 05 '26 22:01 Thyre

I think the failure is because of PGO configuration:

0:00:20 load avg: 1.00 [18/43] test_functools

We are executing a limited subset of tests to gather profiling data and this could be the reason. I think your tool chain may be having issues with that. Have you tried testing this without enabling optimizations? (that is, try without --with-lto and without --enable-optimizations and then only try --enable-optimizations but without LTO)

Jan 05 '26 22:01 picnixz

@picnixz no I don't think it's the PGO's fault. The problem is more likely that our stack detection algorithms are not working on AArch64 Rocky Linux. The fact the OP has infinite recursion in some of the tests that don't terminate is a sign that the Python-to-Python calls are heap allocating more frames and skipping both the C stack and recursion protectors.

@Thyre please try checking which ifdef compiles on your system. If it's the last one, that probably means it's broken, as it should be using __builtin_frame_address for GNU/Linux. https://github.com/python/cpython/blob/main/Include/internal/pycore_pystate.h#L322

You can test for example by adding Py_FatalError("no stack protection") on the last ifdef's else clause, or a printf also works.

Jan 05 '26 23:01 Fidget-Spinner

Thanks for the quick follow up 😄

I've tried adding Py_FatalError("no stack protection") to the last ifdef's else clause, but this didn't trigger. Next, I checked with this modification:

static inline uintptr_t
_Py_get_machine_stack_pointer(void) {
#if _Py__has_builtin(__builtin_frame_address) || defined(__GNUC__)
    Py_FatalError("uses __builtin_frame_address");
    return (uintptr_t)__builtin_frame_address(0);
#elif defined(_MSC_VER)
    return (uintptr_t)_AddressOfReturnAddress();
#else
    char here;
    /* Avoid compiler warning about returning stack address */
    return return_pointer_as_int(&here);
#endif
}

which then failed with the debug message:

lto-wrapper: warning: using serial compilation of 8 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more information
./Programs/_freeze_module getpath ./Modules/getpath.py Python/frozen_modules/getpath.h
./Programs/_freeze_module importlib._bootstrap ./Lib/importlib/_bootstrap.py Python/frozen_modules/importlib._bootstrap.h
./Programs/_freeze_module importlib._bootstrap_external ./Lib/importlib/_bootstrap_external.py Python/frozen_modules/importlib._bootstrap_external.h
./Programs/_freeze_module zipimport ./Lib/zipimport.py Python/frozen_modules/zipimport.h
Fatal Python error: _Py_get_machine_stack_pointer: uses __builtin_frame_address
Python runtime state: preinitialized

Current thread 0x0000400029e567e0 [_freeze_module] (most recent call first):
  <no Python frame>
make[2]: *** [Makefile:1928: Python/frozen_modules/importlib._bootstrap.h] Aborted (core dumped)
make[2]: *** Waiting for unfinished jobs....
Fatal Python error: _Py_get_machine_stack_pointer: uses __builtin_frame_address
Python runtime state: preinitialized

Current thread 0x00004000334b67e0 [_freeze_module] (most recent call first):
  <no Python frame>
Fatal Python error: _Py_get_machine_stack_pointer: uses __builtin_frame_address
Python runtime state: preinitialized

Current thread 0x00004000398267e0 [_freeze_module] (most recent call first):
  <no Python frame>
Fatal Python error: _Py_get_machine_stack_pointer: uses __builtin_frame_address
Python runtime state: preinitialized

Current thread 0x0000400034d967e0 [_freeze_module] (most recent call first):
  <no Python frame>
make[2]: *** [Makefile:1923: Python/frozen_modules/getpath.h] Aborted (core dumped)
make[2]: *** [Makefile:1931: Python/frozen_modules/importlib._bootstrap_external.h] Aborted (core dumped)
make[2]: *** [Makefile:1934: Python/frozen_modules/zipimport.h] Aborted (core dumped)
make[2]: Leaving directory '/dev/shm/reuter1/easybuild/build/Python/3.14.2/GCCcore-15.2.0/Python-3.14.2'

So __builtin_frame_address should be used.

To make sure this isn't caused by EasyBuild's set flags in some way, I also tried a build just with the systems GCC, i.e. gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5). That ended in the exact same result.

A test build without --enable-optimizations worked. That's not surprising though, as the tests are not executed then. Executing the same test case afterwards causes the same failure. So PGO is not involved here. Removing --with-lto has no impact as well.

A verbose run unfortunately doesn't give too much additional information:

[reuter1@jrc0900 Python-3.14.2]$ LD_LIBRARY_PATH=$(pwd):LD_LIBRARY_PATH ./python Lib/test/test_functools.py --verbose
[...]
test_pickle (__main__.TestPartialC.test_pickle) ... ok
test_placeholders (__main__.TestPartialC.test_placeholders) ... ok
test_placeholders_kw_restriction (__main__.TestPartialC.test_placeholders_kw_restriction) ... ok
test_placeholders_optimization (__main__.TestPartialC.test_placeholders_optimization) ... ok
test_placeholders_refcount_smoke (__main__.TestPartialC.test_placeholders_refcount_smoke) ... ok
test_placeholders_trailing_raise (__main__.TestPartialC.test_placeholders_trailing_raise) ... ok
test_positional (__main__.TestPartialC.test_positional) ... ok
test_protection_of_callers_dict_argument (__main__.TestPartialC.test_protection_of_callers_dict_argument) ... ok
test_recursive_pickle (__main__.TestPartialC.test_recursive_pickle) ... 
Program terminated with signal SIGKILL, Killed.
The program no longer exists.

Jan 06 '26 09:01 Thyre

I think this might be related to the ulimit stack size. On both systems, the stack size is set to unlimited by default. Limiting it to e.g. 8192 via ulimit -s 8192, the tests pass almost immediately.

I need to check this on the other systems, once I have access to them again.

A test build with ulimit -s 16384 worked just fine with EasyBuild as well (see https://github.com/easybuilders/easybuild-easyconfigs/pull/25006#issuecomment-3713950388).

A second test build on x86 (Ubuntu 24.04, Intel Core i7-1260P, 32GB RAM) successfully (?) aborted with ulimit -s unlimited set. So I'm pretty sure this is the culprit.

LD_PRELOAD=/opt/EasyBuild/apps/build/Python/3.14.2/GCCcore-15.2.0/Python-3.14.2/libpython3.14.so ./python -m test --pgo --timeout=
Using random seed: 3222210463
0:00:00 load avg: 1.32 Run 43 tests sequentially in a single process
0:00:00 load avg: 1.32 [ 1/43] test_array
0:00:01 load avg: 1.32 [ 1/43] test_array passed
0:00:01 load avg: 1.32 [ 2/43] test_base64
0:00:01 load avg: 1.30 [ 2/43] test_base64 passed
[...]
0:00:20 load avg: 1.16 [18/43] test_functools
Killed
make: *** [Makefile:1016: profile-run-stamp] Error 137

Jan 06 '26 09:01 Thyre

I'm not sure if Python should support ulimit -s unlimited. So I'm closing this issue for now. Thanks for finding out the root cause!

Jan 06 '26 16:01 Fidget-Spinner

My goodness - thanks for the hint with ulimit! I was letting the test run forever and it just wouldn't finish. Turns out I was simply running it on a machine with lots of memory and wasn't patient enough to let it run out of it to see the error reported here.

@Fidget-Spinner While I don't know whether ulimit -s unlimited is worth supporting or not, I think it would be very valuable to check for this condition and at least output a warning (or even straight up an error) when detected. Otherwise, people will likely have a hard time when trying to figure out why it's not working 🤔

Jan 08 '26 11:01 Krzmbrzl

Hmm, I think it would indeed be valuable to skip those tests if we detect ulimit -s unlimited. If anyone is willing to submit a PR, I can review and merge it.

Jan 08 '26 11:01 Fidget-Spinner

Just to understand correctly, sorry, not familiar with the CPython code base at all, there seem to be two options to handle this:

Determine in configure if ulimit -s unlimited is set, AC_SUBST the result and use that to determine if the tests should be skipped.
Add a decorator in Lib/test/support/__init__.py to do the same check, and decide during runtime if we want to skip those tests.

I'd say the second option looks to be better, as someone could change ulimit -s after configure. Additionally, reading a decorator is probably easier than reading configure.ac (at least from my experience). Would it be okay to run ulimit -s e.g. via subprocess.run to determine the value during the tests?

We'd need to find all tests that need this decorator, but that can be done by adding the decorator where necessary and re-running make with PGO enabled with ulimit -s unlimited set.

Jan 09 '26 08:01 Thyre

Yes I was thinking of option 2. as well. Great work investigating the codebase!

Jan 09 '26 10:01 Fidget-Spinner

I'll try to take a look.

Jan 09 '26 10:01 Thyre

Can't we use resource.getrlimit to query this limit instead of a subprocess run?

Jan 09 '26 10:01 picnixz

Right, that should work. I think when I checked that before, it returned (-1, -1), which could be easily checked. I'll try that out.

Jan 09 '26 10:01 Thyre

I'm unsure if we need to backport this to 3.13.

Jan 09 '26 16:01 Fidget-Spinner

I'm unsure if we need to backport this to 3.13.

We've built Python 3.13 on the systems where I've seen the issue described here and didn't run into any issues during the build. So I think it's not necessary to backport this.

Jan 09 '26 16:01 Thyre

Thanks for checking. I suspect it might be linked to how we deal with stack size growth on 3.14.

Jan 09 '26 17:01 Fidget-Spinner

Python 3.14/3.15a build aborting due to OOM during `test_functools` / `test_json`

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on: