pyo3
pyo3 copied to clipboard
Tracking Issue: Sub-Interpreter Support
Tracks the development and state of supporting sub-interpreters in PyO3.
This issue really only tracks progress, for discussing everything else, feel free to join over here: https://github.com/Aequitosh/pyo3/discussions/1
Summary
As of 13.09.2023
PyO3 currently doesn't support sub-interpreters, which will lead to an ImportError being raised if a module using PyO3 is initialized more than once per interpreter process. As stated in https://github.com/PyO3/pyo3/pull/2523, this is necessary in order to prevent soundness holes (as in, prevent things that use PyO3 from randomly breaking, having nasty undefined behaviour, etc.).
Even though this prevents soundness holes on the one hand, it can lead to modules / applications using a sub-interpreter model to "break" in certain situations. For examples, see https://github.com/pyca/cryptography/issues/9016 and https://github.com/bazaah/aur-ceph/issues/20.
Implementing sub-interpreter support isn't straightforward and requires quite a substantial redesign of PyO3's API. This issue shall track this redesign and provide as much relevant information as possible for all that wish to contribute.
Goals
Adapted from https://github.com/PyO3/pyo3/issues/576#issuecomment-1713975913, as of 13.09.2023.
Mid-Term
- [ ] Rework synchronization primitives to not rely on the GIL. See https://github.com/PyO3/pyo3/pull/2885
- [ ] Develop transition plan so that existing users can migrate their code without enormous amounts of work
- [ ] Remove
staticdata from PyO3's implementation, either move things toPyModule_GetState(preferred) orPyInterpreterState_GetDict(alternative)
- [ ] Allow extension authors to use
unsafein order to opt in to sub-interpreter support - it is their responsibility to guarantee to not storePy<T>in any static data. - [ ] Document all conditions that extension authors' modules need to satisfy so that they may be used within sub-interpreters
Long-Term
Possibly remove the need for extension authors to audit their own code once we're confident enough.
Tasks
TBA - might them here (or some other place) once more concrete pieces of work have been identified.
Relevant Issues & Interesting Reads
Listing relevant things here. Some things might already be linked above, but it's nevertheless nice to have everything in one place.
- Initial discussion regarding sub-interpreter support: #576
- PR regarding nogil Python support, contains lots of additional information: https://github.com/PyO3/pyo3/pull/2885
- Discussion regarding making Python's C-API more friendly for Rust; linking to comment what would need to happen in PyO3 internally: https://github.com/PyO3/pyo3/discussions/2346#discussioncomment-2911159
cryptographyissue regarding sub-interpreters in PyO3: https://github.com/pyca/cryptography/issues/9016aur-cephmaintainer's issue regarding Ceph's sub-interpreter model, and why Ceph Dashboard breaks: https://github.com/bazaah/aur-ceph/issues/20- Idea by @GoldsteinE - using a
ghostcell-ish pattern: https://github.com/PyO3/pyo3/issues/576#issuecomment-1713999916
Note that I will update this issue whenever updates, new infos, etc. appear in order to keep everything relatively tidy.
@Aequitosh, just in case you didn't have seen, David have redirected the Multiple Gill Acquire to here and close the other one to don't have two lines of the same subject, so now we will continue the subject here ;)
Hi there!
For those following this issue, I've got a short update: I'm slowly able to pick up on all this again, now that there are less things going on in my private life.
Currently, I'm working on properly drafting up and implementing a prototype of an idea that's been living rent-free in my head the past few weeks - I figured it's finally time I brought it to life in the form of code. More details will follow as soon as I'm more confident with the idea - that is, once I've actually implemented it in prototypical form and seen it in action.
See this more as a sign that this issue is still alive; I'm still very eager to work on this, even though I wasn't able to for a while.
So, I have been working on and off on this. The more I begin to understand how CPython's insides work, the more I realize how complicated this actually is.
Nevertheless, I've got a rough plan for removing static data from PyO3. I think this is a good first "milestone" (or whatever you'd like to call it) for this issue - I will elaborate on this further below.
Per-Module State
From what I've been experimenting with, it's probably best to move static data into the per-module memory-area (which can be accessed via PyModule_GetState) as was initially preferred.
The absolutely fantastic thing about per-module state is that, according to the CPython docs, it's an arbitrarily-sized block of memory allocated on the Python interpreter's heap that is sub-interpreter safe to access. This makes it the ideal place to store more than just static data - I will elaborate on this below.
Relocating currently static data to this per-module memory region will require a new mechanism to actually put stuff on there during the initialization of a module. To give a more concrete example, instead of statically allocating docstrings, they should instead perhaps be allocated in a separate container, and then be moved / cloned / etc. on the per-module memory block.
To provide an analogy, this mechanism (or API) would work similar to something like lazy_static or OnceLock, just quite a bit more elaborate. This "pseudo-static docstring container" would be mutable during the module initialization phase and made immutable once put onto the Python heap.
But obviously this goes beyond just storing docstrings - and in my opinion, also beyond just storing static data.
A Place For More Than statics
I reckon that implementing this hypothetical mechanism described above will require quite a lot of changes to PyO3's internals; at least that's what it looks like to me right now.
Nevertheless, I think it can be leveraged for more than just static data - for example, depending on how we'll actually make the current synchronization primitives independent from the GIL, we could definitely store other per-module state there, including e.g. a per-module lock that emulates the GIL (really just an example!).
What can (and what should) be in the per-module state is still up for discussion (some of it perhaps beyond the current scope of this issue), but I think it's safe to say that we should start with relocating static data there - and that's where I'm currently at.
Next Steps
Because working on the synchronization primitives first (the prototype I had mentioned in my prior post) was maybe a too big of a chunk up front, this is what I'll be working on in the next couple weeks:
- Some kind of
structliving on the Python heap representing per-module state where currentlystaticdata will be moved to - An internal API regarding per-module state
- Only for per-module
staticdata for now, but once the flow's been worked out, I don't see why this couldn't also be used for other purposes - Perhaps something that can be made
pub(or have apublayer, rather) once the details have been fleshed out
- Only for per-module
Also, I'll probably open a developer diary or something over at the discussions of my fork in order to keep this thread rather clean - if you'd like to comment on this, feel free to open a discussion there too.
I was initially intending to report back once I had something more concrete, but I feel it's better to share some bits and pieces here and there - maybe it encourages somebody to share their ideas or comments as well.
Example of Per-Module State
There is a fantastic example that just so happened to be added in the meantime, which demonstrates this; moreover, it shows how to make a sub-interpreter safe module using PyO3's FFI bindings. Leaving this here as it shows mostly what I mean.
Agreed that per module state is a necessary first step which can have general value beyond subinterpreters. I've actually been playing around with the first step for supporting that, which is changing PyO3 to do something compatible with pep 489. Ideally I can push this soon!
That's actually fantastic - I've got multi-phase initialization to almost work at the moment; I still have to change a bunch of the proc macro stuff so I can actually attach functions, classes, etc. to my module. It otherwise loads just fine (though I get a double-free when the garbage collector picks it up, woops).
Let me know if I can lend a hand or anything! I haven't pushed my stuff yet, but might soon. I'll ping you over at my fork once I do (if that's alright).
I think part of this would be to have an optional state: &State argument in functions and methods that passes in some user defined (part of) the module state, so that users can also put their static data in it. Much like how web frameworks pass in a Context struct so users don't have to use global variables.
Example of Per-Module State
There is a fantastic example that just so happened to be added in the meantime, which demonstrates this; moreover, it shows how to make a sub-interpreter safe module using PyO3's FFI bindings. Leaving this here as it shows mostly what I mean.
Thanks! I mostly wrote it to get some experience with it and to get a feel of what it should look like. I'm happy if it does the same for others :)
Back with an update! I opened up PR https://github.com/PyO3/pyo3/pull/4162, which implements almost fully functional multi-phase module initialization. See the PR for more information.
More work will continue off and on in the meantime. Exams are coming up, but I'll try to make some time every now and then.
Healthcheck: Now that university stuff has cooled down, I can finally dedicate some more time to this again. Just rebased my PR on main and added some notes for future me.
Back with some good news! Multi-phase initialization now works, even for submodules.
I've updated PR #4162 correspondingly; it's now an RFC. See the PR's description for more details.
It's still a little rough around the edges, but we're getting much closer to merging now, I feel. (Unless something unexpected pops up, that is.)
@davidhewitt Kindly pinging you here and asking you to take a look whenever you have time. ;)