cymem icon indicating copy to clipboard operation
cymem copied to clipboard

ENH: Free-Threading (No-GIL) Support for Python 3.13+

Open HaoZeke opened this issue 7 months ago • 4 comments

From a quick look it seems like the main issue will be synchronizing access to the Pool state. So perhaps:

  • A per-instance std::mutex (from libcpp.mutex) within the cymem.Pool class.
  • alloc etc. will be protected by the C++ mutex using lock_guard and with nogil: before acquiring the mutex (so we don't deadlock with the GIL).
  • Pool.realloc might need a different approach / refactor, since we can't recursively have lock acquisition.
  • Py_INCREF/Py_DECREF is imported by unused, might be needed for own_pyref and __dealloc__.

The standard TODOs for adding free-threading support are:

  • [ ] Audit Python bindings and declare them free-threading compatible (xref https://py-free-threading.github.io/porting/#updating-extension-modules).
  • [ ] Run the test suite with pytest-run-parallel to find potential issues, and fix them.
  • [ ] Run the test suite under ThreadSanitizer. If possible, depends on how many dependencies there are and if they run under TSan.
  • [ ] Add cp313t-* to CI to build free-threading wheels.

For more details, please see the suggested plan of attack in the py-free-threading guide.

Note that this is the first time I've looked at this repo, so I might be missing known issues or code that needs closer inspection. Any suggestions here will be very useful.

Let me know if I ought to hold off for any reason. One thing which is slightly concerning is that there are few in-project tests.

HaoZeke avatar May 16 '25 16:05 HaoZeke

@honnibal do you know if there are any plans for expanding tests or maybe if you know of a good example case for free threaded tests?

HaoZeke avatar May 16 '25 18:05 HaoZeke

a good example case for free threaded tests?

I'd take the sparse matrix example from the README, put it in a package in a .pyx file, then import it from a test .py file and do some basic operations with it in a regular unit test. And then run pytest-run-parallel over it. That should be easy to do locally - in the test suite it'd require some more boilerplate for building the .pyx file as part of the tests, which I'd avoid for now (numpy and scipy have this machinery for testing Cython API, it wouldn't be hard to port that over, but I wouldn't do this unless a maintainer asks for this contribution).

Here is how I would approach this:

  1. A first PR with the basic plumbing and a docs update for free-threading:
    • Update setup.py to add the freethreading_compatible directive for Cython>=3.1.0
    • Add 3.13 and 3.13t interpreters to tests.yml
    • Update the cibuildwheel config to build free-threaded wheels
    • Add a note in the README that says something like "cymem has support for free-threaded CPython. Currently Pool isn't thread-safe when using from multiple threads at once, please avoid sharing a single Pool instance between threads".
  2. A second PR after running pytest-run-parallel and TSan locally on the test case I suggested above and ensuring those tools say everything is thread-safe - which will likely require the locking you suggested. The README note can then be updated to say "fully thread-safe".

I'm not a maintainer, but hopefully this will unblock you. It'd be great to see this being addressed; even (1) alone will be helpful to make it easier to work on higher-level packages that depend on cymem.

rgommers avatar May 20 '25 13:05 rgommers

Thanks for looking at this! It's interesting.

Could you give some context to the motivation for this?

I've been assuming that mostly this library is used within Explosion's ecosystem. Are you using the library outside of that?

honnibal avatar May 28 '25 08:05 honnibal

Hi @honnibal, of course. Our team at Quansight Labs has been working on free-threading support for a little over a year now, contributing to the work in CPython and many OSS projects - working from the build tools and lowest levels of the dependency graph for the scientific and ML/AI stacks. Here is a recent blog post about that: https://labs.quansight.org/blog/free-threaded-one-year-recap. As we go, we determine what packages to contribute to next based on a few factors, like how widely used it is, how much maintenance bandwidth it has, and if it's on a list of dependencies for some of the first real-world workflows we're trying to enable (mostly deep learning focused).

We started to look at cymem because:

  • spacy, thinc and cymem are all fairly highly on this tracker of PyPI packages with compiled code and their free-threading support status: https://hugovk.github.io/free-threaded-wheels/.
  • spacy and thinc are used in one of the real-world workflows
  • with Cython 3.1.0 just released with free-threading support, this effort is now a fair bit easier - working with Cython nightlies in CI has been a bit cumbersome, and temporary workarounds are not very appealing to maintainers.

spacy and thinc are of course a much heavier lift, and they have other dependencies. It looks like the non Explosion deps are ready (cython, numpy and pydantic being the key ones). If you have any thoughts about free-threading support for your whole stack, I'd love to hear them.

rgommers avatar May 28 '25 09:05 rgommers

Hey folks! I picked this up this week to try and bring it over the finish line.

After diving a bit deeper into how cymem keeps track of the memory it allocates, I don't think we even need a per-object mutex. Basically, the fact that mimalloc is thread safe makes it so that self.addresses is probably never going to touch the same keys. And because dictionaries have an internal mutex themselves, concurrent mutations of different keys of the dictionary is always okay.

Having said that, it's not at all guaranteed that self.size will be accurate. Since that's not used for __dealloc__ that's not a big problem, but we can make that into an atomic if it is important.

I'll go ahead and open a PR if people are okay about it trying to document the situation and start building free-threaded wheels. What do you think?

lysnikolaou avatar Jul 31 '25 13:07 lysnikolaou

gh-53 was merged with free-threading support and wheels for 3.14t, and that PR will be included in the 2.0.12 release which is coming soon. So I'll go ahead and close this issue as completed. Thanks all!

rgommers avatar Nov 13 '25 10:11 rgommers