llvmlite Add 3.14t wheel builds by skipping PyLIEF install

Adds 3.14t to the wheel building matrix. Because PyLIEF doesn't yet ship cp314t wheels (https://github.com/lief-project/LIEF/issues/1255) and PyLIEF doesn't list any build requirements in its pyproject.toml, it's necessary to set up the PyLIEF build manually.

Unfortunately this adds a decent amount of overhead to the builds, particularly for the Mac builds, where the PyLIEF build happens twice during the build and install steps. We could probably optimize that but I'm curious if the maintainers are opposed to this approach and would prefer I work on getting PyLIEF wheels ready instead.

Nov 03 '25 18:11 ngoldbaum

I ran the github actions tests on my fork, see the github actions tests associated with https://github.com/ngoldbaum/llvmlite/commit/43c266b820a4ea6c44201762a3f54da61c3f4017. As far as I can see, all the cp314t wheel builds complete successfully and the tests pass for all the builds that are tested.

Nov 03 '25 18:11 ngoldbaum

@ngoldbaum thats awesome! So cool to see that it will work! I'll confer with the other maintainers during our weekly meet tomorrow and get back to you with suggestions for acceptability and best course of action. Thank you for helping to improve llvmlite.

Nov 03 '25 19:11 esc

Hmm, spoke too soon, it looks like the Windows tests crashed. I'll see if I can reproduce the crash on my Windows machine.

Nov 03 '25 20:11 ngoldbaum

Frustratingly, if I install the wheel that gets produced on the windows build job on my Windows 10 development machine locally, all the tests pass and I don't see a segfault. See https://gist.github.com/ngoldbaum/52d08ca2d91a196c0a0528abd6f0b709.

Line 35 of test_bindings.py does import lief, so I guess there's something broken about the second PyLIEF build in the testing stage. I could fix that, but also it looks like the first PyLIEF build works correctly.

So, probably the thing to do is to build PyLIEF once, upload the wheel alongside the llvmlite wheel, and then use the PyLIEF wheel from the build stage in the testing stage. That will cut down on the build time too.

When the tests complete there's also tons of output from PyLIEF's use of nanobind, complaining about leaked instances and types. I already reported that upstream and have been meaning to spend some time to figure out why that only seems to happen on 3.14t.

Nov 05 '25 17:11 ngoldbaum

Frustratingly, if I install the wheel that gets produced on the windows build job on my Windows 10 development machine locally, all the tests pass and I don't see a segfault. See https://gist.github.com/ngoldbaum/52d08ca2d91a196c0a0528abd6f0b709.

Line 35 of test_bindings.py does import lief, so I guess there's something broken about the second PyLIEF build in the testing stage. I could fix that, but also it looks like the first PyLIEF build works correctly.

So, probably the thing to do is to build PyLIEF once, upload the wheel alongside the llvmlite wheel, and then use the PyLIEF wheel from the build stage in the testing stage. That will cut down on the build time too.

When the tests complete there's also tons of output from PyLIEF's use of nanobind, complaining about leaked instances and types. I already reported that upstream and have been meaning to spend some time to figure out why that only seems to happen on 3.14t.

Hello @ngoldbaum, Firstly, thank you very much for the PR. Appreciate you taking time to understand the GHA build system nuances to make this work.

Regarding PyLIEF usage: it's an optional test dependency and should not be present in the build environment or build steps. In the test environment, it's required to check the build against expected dynamic imports. Since PyLIEF with 3.14t is not stable yet, we can skip having it in the 3.14t test environment for now. This would resolve the Windows segfault you're encountering with 3.14t build. I removed pylief from build and test for 3.14t on my fork using this PR branch and see that the workflow passes - https://github.com/swap357/llvmlite/pull/126/files#diff-78b95d30060f3483a1c8b504f4fd34d18d545529c48516a9fcb27b99c6e03d90

I'd suggest removing PyLIEF installation from build env and conditionally remove installing it on 3.14t test env ? Once upstream support is available for PyLIEF with 3.14t, we can add it back to test env.

Nov 05 '25 20:11 swap357

Wow! Thanks for the context. That massively simplifies things.

While looking this over, I noticed that the validate steps are all "broken" on 3.14t - they install a Python 3.12 environment:

https://github.com/swap357/llvmlite/actions/runs/19111128292/job/54608653224?pr=126#step:3:521

That said, it may not actually matter that the validation step runs on Python 3.12, since the test is using LIEF to parse binaries, it may not be sensitive to the host or target Python version.

Nov 05 '25 20:11 ngoldbaum

Wow! Thanks for the context. That massively simplifies things.

While looking this over, I noticed that the validate steps are all "broken" on 3.14t - they install a Python 3.12 environment:

https://github.com/swap357/llvmlite/actions/runs/19111128292/job/54608653224?pr=126#step:3:521

That said, it may not actually matter that the validation step runs on Python 3.12, since the test is using LIEF to parse binaries, it may not be sensitive to the host or target Python version.

The validate step is due for refactoring. The step does 2 things - validate imports with PyLIEF and twine check built wheel. The validate imports check there needs to be removed, it's redundant. It was added before we had imports test on test_binding, so that needs to be removed but I can do that on a separate PR. For this PR, keeping validate as is would be okay. As you noted, it uses Python3.12 and won't have any problems.

Nov 05 '25 20:11 swap357

I rebased and squashed everything. I think this is ready now.

Thanks for the pointer about it being a test-only dependency, I'm not sure where I got it in my head that it was a build dependency too.

Nov 05 '25 21:11 ngoldbaum

I rebased and squashed everything. I think this is ready now.

Thanks for the pointer about it being a test-only dependency, I'm not sure where I got it in my head that it was a build dependency too.

This looks great! Thanks again.

Nov 05 '25 21:11 swap357

@swap357 @ngoldbaum great to see this move forward! I was going to mention that, the discussion during the developer meeting was fruitful, however we would not have accepted a GHA update that builds PyLIEF wheels on the fly so the current resolution seems really good!

Since free-threading support is technically scheduled for 0.47 I've placed it in that milestone. My current plan is to get 0.46 (which is for general 3.14 support) out the door and then look at getting this PR merged to main. We will need to test and build on 3.14t and this is most promising (and only?) PR to accomplishes this so far.

Lastly are we all in agreement that this PR shows that llvmlite is very, very likely to work just fine ™️ in the free-threading/no-gil context?

Nov 06 '25 14:11 esc

Are there other pain spots where it would be useful for me to look? Happy to do so: unblocking Numba on 3.14t is high on our team's priority list right now.

Nov 06 '25 14:11 ngoldbaum

Lastly are we all in agreement that this PR shows that llvmlite is very, very likely to work just fine ™️ in the free-threading/no-gil context?

If someone is using llvmlite in a single-threaded context or an effectively single-threaded context (e.g. no sharing state inside llvmlite between threads), then I think everything is OK. I do see a few issues when I do some multithreaded testing (see below) but I can also trigger all the issues I find on the GIL-enabled build as well, so they're not new.

When I try to run the llvmlite test suite using unittest-ft on both the GIL-enabled and free-threaded builds I see some test failures. Note that to actually run unittest-ft with llvmlite, the test suite needs a load_tests implementation and another small fix to test_ir.py:

diff --git a/llvmlite/tests/__init__.py b/llvmlite/tests/__init__.py
index 7f2b3b0..105b47e 100644
--- a/llvmlite/tests/__init__.py
+++ b/llvmlite/tests/__init__.py
@@ -1,4 +1,6 @@
+import os
 import sys
+import warnings

 import unittest
 from unittest import TestCase
@@ -27,6 +29,14 @@ def discover_tests(startdir):
     return suite


+def load_tests(loader, standard_tests, pattern):
+    # top level directory cached on loader instance
+    this_dir = os.path.dirname(__file__)
+    package_tests = loader.discover(start_dir=this_dir, pattern="test_*.py")
+    standard_tests.addTests(package_tests)
+    return standard_tests
+
+
 def run_tests(suite=None, xmloutput=None, verbosity=1):
     """
     args
diff --git a/llvmlite/tests/test_ir.py b/llvmlite/tests/test_ir.py
index d51f643..30cf26d 100644
--- a/llvmlite/tests/test_ir.py
+++ b/llvmlite/tests/test_ir.py
@@ -9,7 +9,8 @@ import re
 import textwrap
 import unittest

-from . import TestCase
+from unittest import TestCase
+
 from llvmlite import ir
 from llvmlite import binding as llvm
 from llvmlite import ir_layer_typed_pointers_enabled
@@ -548,49 +549,6 @@ class TestIR(TestBase):

unittest-ft runs every test in the test suite in a thread pool. It's not unusual for test suites to not be designed to be run like this and to be implemented using global state and this may be the case here. Here's some example failures I see running unittest-ft in stress-test mode (passing -s) and with randomized test order (passing -r):

==================================================================================
FAIL: test_fanout_3_limited (test_refprune.TestFanout.test_fanout_3_limited) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 434, in test_fanout_3_limited
    self.assertEqual(stats.fanout, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
AssertionError: 3 != 0

==================================================================================
FAIL: test_fanout_1 (test_refprune.TestFanout.test_fanout_1) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 388, in test_fanout_1
    self.assertEqual(stats.fanout, 3)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
AssertionError: 6 != 3

==================================================================================
FAIL: test_fanout_raise_2 (test_refprune.TestFanoutRaise.test_fanout_raise_2) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 479, in test_fanout_raise_2
    self.assertEqual(stats.fanout_raise, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 0

==================================================================================
FAIL: test_per_diamond_4 (test_refprune.TestDiamond.test_per_diamond_4) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 342, in test_per_diamond_4
    self.assertEqual(stats.diamond, 2)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: 4 != 2

==================================================================================
FAIL: test_per_diamond_3 (test_refprune.TestDiamond.test_per_diamond_3) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 322, in test_per_diamond_3
    self.assertEqual(stats.diamond, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 0
----------------------------------------------------------------------------------

Because this is a multithreaded test of code that exhibits a race condition, all of these failures are nondeterministic. I don't immediately see where the race is happening. Given this happens on the GIL-enabled build and the free-threaded build, I don't think you should worry about it now with any urgency because you're adding support for the free-threaded build. Probably worth investigating and understanding, though.

I also see an explicitly multithreaded test in test_binding.py, which passes on the free-threaded build. There's only the one test and adding more tests of supported multithreaded workflows can't hurt. It'd also probably help to add some docs on what is and isn't supported in a multithreaded environment.

Another thing that might be worth doing is setting up a Thread Sanitizer build and then run the existing multithreaded test and unittest-ft to check for data races. That said, I think that might require a TSan build of LLVM itself, which might be a tall order.

Nov 06 '25 18:11 ngoldbaum

Lastly are we all in agreement that this PR shows that llvmlite is very, very likely to work just fine ™️ in the free-threading/no-gil context?

If someone is using llvmlite in a single-threaded context or an effectively single-threaded context (e.g. no sharing state inside llvmlite between threads), then I think everything is OK. I do see a few issues when I do some multithreaded testing (see below) but I can also trigger all the issues I find on the GIL-enabled build as well, so they're not new.

When I try to run the llvmlite test suite using unittest-ft on both the GIL-enabled and free-threaded builds I see some test failures. Note that to actually run unittest-ft with llvmlite, the test suite needs a load_tests implementation and another small fix to test_ir.py:
diff --git a/llvmlite/tests/__init__.py b/llvmlite/tests/__init__.py
index 7f2b3b0..105b47e 100644
--- a/llvmlite/tests/__init__.py
+++ b/llvmlite/tests/__init__.py
@@ -1,4 +1,6 @@
+import os
 import sys
+import warnings

 import unittest
 from unittest import TestCase
@@ -27,6 +29,14 @@ def discover_tests(startdir):
     return suite


+def load_tests(loader, standard_tests, pattern):
+    # top level directory cached on loader instance
+    this_dir = os.path.dirname(__file__)
+    package_tests = loader.discover(start_dir=this_dir, pattern="test_*.py")
+    standard_tests.addTests(package_tests)
+    return standard_tests
+
+
 def run_tests(suite=None, xmloutput=None, verbosity=1):
     """
     args
diff --git a/llvmlite/tests/test_ir.py b/llvmlite/tests/test_ir.py
index d51f643..30cf26d 100644
--- a/llvmlite/tests/test_ir.py
+++ b/llvmlite/tests/test_ir.py
@@ -9,7 +9,8 @@ import re
 import textwrap
 import unittest

-from . import TestCase
+from unittest import TestCase
+
 from llvmlite import ir
 from llvmlite import binding as llvm
 from llvmlite import ir_layer_typed_pointers_enabled
@@ -548,49 +549,6 @@ class TestIR(TestBase):
unittest-ft runs every test in the test suite in a thread pool. It's not unusual for test suites to not be designed to be run like this and to be implemented using global state and this may be the case here. Here's some example failures I see running unittest-ft in stress-test mode (passing -s) and with randomized test order (passing -r):
==================================================================================
FAIL: test_fanout_3_limited (test_refprune.TestFanout.test_fanout_3_limited) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 434, in test_fanout_3_limited
    self.assertEqual(stats.fanout, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
AssertionError: 3 != 0

==================================================================================
FAIL: test_fanout_1 (test_refprune.TestFanout.test_fanout_1) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 388, in test_fanout_1
    self.assertEqual(stats.fanout, 3)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
AssertionError: 6 != 3

==================================================================================
FAIL: test_fanout_raise_2 (test_refprune.TestFanoutRaise.test_fanout_raise_2) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 479, in test_fanout_raise_2
    self.assertEqual(stats.fanout_raise, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 0

==================================================================================
FAIL: test_per_diamond_4 (test_refprune.TestDiamond.test_per_diamond_4) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 342, in test_per_diamond_4
    self.assertEqual(stats.diamond, 2)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: 4 != 2

==================================================================================
FAIL: test_per_diamond_3 (test_refprune.TestDiamond.test_per_diamond_3) (x1)
----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.7/lib/python3.13/site-packages/llvmlite/tests/test_refprune.py", line 322, in test_per_diamond_3
    self.assertEqual(stats.diamond, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 0
----------------------------------------------------------------------------------
Because this is a multithreaded test of code that exhibits a race condition, all of these failures are nondeterministic. I don't immediately see where the race is happening. Given this happens on the GIL-enabled build and the free-threaded build, I don't think you should worry about it now with any urgency because you're adding support for the free-threaded build. Probably worth investigating and understanding, though.

I also see an explicitly multithreaded test in test_binding.py, which passes on the free-threaded build. There's only the one test and adding more tests of supported multithreaded workflows can't hurt. It'd also probably help to add some docs on what is and isn't supported in a multithreaded environment.

Another thing that might be worth doing is setting up a Thread Sanitizer build and then run the existing multithreaded test and unittest-ft to check for data races. That said, I think that might require a TSan build of LLVM itself, which might be a tall order.

Thanks for bringing unittest-ft to our attention. This is really interesting, I wasn't aware of this. The threading issues need more investigating, I'll bring this to maintainers attention.

Nov 07 '25 22:11 swap357