cpython
cpython copied to clipboard
Statically allocate interpreter states as much as possible.
BPO | 45953 |
---|---|
Nosy | @gvanrossum, @vstinner, @markshannon, @ericsnowcurrently, @JulienPalard, @miss-islington, @brandtbucher |
PRs |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = 'https://github.com/ericsnowcurrently'
closed_at = None
created_at = <Date 2021-12-01.18:50:39.120>
labels = ['interpreter-core', '3.11']
title = 'Statically allocate interpreter states as much as possible.'
updated_at = <Date 2022-03-29.07:04:09.685>
user = 'https://github.com/ericsnowcurrently'
bugs.python.org fields:
activity = <Date 2022-03-29.07:04:09.685>
actor = 'mdk'
assignee = 'eric.snow'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2021-12-01.18:50:39.120>
creator = 'eric.snow'
dependencies = []
files = []
hgrepos = []
issue_num = 45953
keywords = ['patch']
message_count = 15.0
messages = ['407476', '408507', '410315', '410441', '410536', '410537', '412046', '412213', '412291', '416159', '416161', '416170', '416208', '416210', '416245']
nosy_count = 7.0
nosy_names = ['gvanrossum', 'vstinner', 'Mark.Shannon', 'eric.snow', 'mdk', 'miss-islington', 'brandtbucher']
pr_nums = ['29883', '30092', '30096', '30588', '30589', '30590', '31038']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue45953'
versions = ['Python 3.11']
Currently we allocate each new PyInterpreterState in PyInterpreterState_New(). Furthermore, PyInterpreterState is full of pointers which are almost all allocated on the heap during runtime init. We can statically allocate (and initialize?) most of what goes into PyInterpreterState, as well as the main interpreter itself (as part of _PyRuntimeState). This includes each interpreter's initial PyThreadState.
TODO:
- [x] add
PyInterpreterState main;
to _PyRuntimeState.interpreters and use it - [ ] change references from the pointer to the value
- [x] add
PyThreadState _main;
to PyInterpreterState.threads and use it - [ ] change references from the pointer to the value
- [ ] change PyInterpreterState pointer fields to values (as much as possible)
- [ ] change PyThreadState pointer fields to values (as much as possible)
benefits:
- fewer possible failures (no memory) during runtime/interpreter/thread init
- improved memory locality for pointers looked up relative to interpreter/thread state
There is one non-trivial bit: embedding the various PyObject values in PyInterpreterState and PyThreadState means hard-coding the various pieces of the object there (e.g. for dict, its keys/values; for list, its array), as well as adding necessary init code to PyInterpreterState_New() and PyThreadState_New(). The resulting added complexity can be mitigated somewhat with macros or even code generation. (In fact, there is probably significant overlap with Guido's deepfreeze tool.) Regardless, we'll probably need to factor out init funcs for a number of object types, where currently there are only "Py*_New()" funcs that combine allocation and init.
New changeset 121f1f893a39d0b58d3d2b5597505c154ecaac2a by Eric Snow in branch 'main': bpo-45953: Statically initialize the small ints. (gh-30092) https://github.com/python/cpython/commit/121f1f893a39d0b58d3d2b5597505c154ecaac2a
New changeset cf496d657a1a82eaf9ebfb47d721676fef6effa5 by Eric Snow in branch 'main': bpo-45953: Statically allocate and initialize global bytes objects. (gh-30096) https://github.com/python/cpython/commit/cf496d657a1a82eaf9ebfb47d721676fef6effa5
New changeset ed57b36c32e521162dbb97199e64a340d3bff827 by Eric Snow in branch 'main': bpo-45953: Statically allocate the main interpreter (and initial thread state). (gh-29883) https://github.com/python/cpython/commit/ed57b36c32e521162dbb97199e64a340d3bff827
New changeset 324908ba936d5d262026deebb81f050803848c41 by Eric Snow in branch 'main': bpo-45953: Statically initialize all the PyThreadState fields we can. (gh-30590) https://github.com/python/cpython/commit/324908ba936d5d262026deebb81f050803848c41
New changeset 322f962f3ee31d0dbde99e36379de8488ccc6804 by Eric Snow in branch 'main': bpo-45953: Statically initialize all the non-object PyInterpreterState fields we can. (gh-30589) https://github.com/python/cpython/commit/322f962f3ee31d0dbde99e36379de8488ccc6804
Any chance we could revert the recent renaming of tstate.exc_state and tstate.root_cframe in https://github.com/python/cpython/pull/30590? It broke Greenlet again:
https://github.com/python-greenlet/greenlet/issues/288
If it's only a name change (and the members themselves are the same), I think reverting it is preferable to burying Greenlet in more compatibility macros and bugging them to put out another new release.
Any chance we could revert the recent renaming of tstate.exc_state and tstate.root_cframe
Yeah, I'll sort this out. Sorry for that.
New changeset f78be59c83c151d94902daef56218530c52e29e7 by Eric Snow in branch 'main': bpo-45953: Preserve backward compatibility on some PyThreadState field names. (GH-31038) https://github.com/python/cpython/commit/f78be59c83c151d94902daef56218530c52e29e7
Since https://github.com/python/cpython/commit/121f1f893a39d0b58d3d2b5597505c154ecaac2a, sys.getrefcount(1)
is surprising:
>>> __import__("sys").getrefcount(1)
1000000210
Should sys.getrefcount try to "fix" the value like by returning PyREFCNT(object) % 999999999
? (But using a define to avoid the "magic, copy/pasted value").
Should sys.getrefcount try to "fix" the value (...)
https://peps.python.org/pep-0683/ would make it possible. Right now, I don't think that it's possible.
Right now, a refcount of 1000000210 can be a real value, or it can be an immortal object.
Please don’t try to “fix” anything. The value is only useful if you understand the implementation. It should map straightforwardly to what’s in memory.
On Mon, Mar 28, 2022 at 05:16 STINNER Victor <[email protected]> wrote:
STINNER Victor <[email protected]> added the comment:
> Should sys.getrefcount try to "fix" the value (...)
https://peps.python.org/pep-0683/ would make it possible. Right now, I don't think that it's possible.
Right now, a refcount of 1000000210 can be a real value, or it can be an immortal object.
----------
Python tracker <[email protected]> <https://bugs.python.org/issue45953>
-- --Guido (mobile)
Hum, and why 999999999? I am probably missing something obvious but 1 should be enough to ensure the value never hits 0. Except for refcount bugs obviously, but I don't think this is the right reason?
I used 999999999 in deepfreeze.py to signify "immortal object". It has been copied by others (small integers are essentially immortal too). I wasn't too sure that the refcount wouldn't go below zero if the interpreter is repeatedly finalized and reinitialized. Once we have official immortal objects (see PEP-683) we should switch to that.
Since you seem to be challenging the value of 999999999, my question to you is, why do you care what the refcount of 1 is?
Since you seem to be challenging the value of 999999999, my question to you is, why do you care what the refcount of 1 is?
Yesterday I was teaching Python, and we were speaking of integer immutability, names being "labels to objects" and so on, and I was showing the memory layout of all of this by hand on a whiteboard while "prooving" my drawings using an interpreter.
While doing so came a question like "So, many modules can use the object int(1)?" So I answered yes, told that I expected many reuse of 1, and went importing sys.getrefcount to show them.
And boom, it printed 1000000209 so I bugged for a few seconds, the value was obviously not the real refcount, and was also obviously bumped by a constant like 100000000, so I went inspecting why and found this commit.
I have nothing against keeping 999999999, but in the other hand it could surprise other people, maybe we should at least document it near sys.getrefcount.
This looks completed, @ericsnowcurrently can this be closed now?
We should add a note in the docs, as @JulienPalard recommended. Arguably, that should be part of https://github.com/python/cpython/issues/98154.
Otherwise, I'll take a look soon to see if there's anything else left to do.
Following @JulienPalard comments, I also faced the same issue during a Python course. Whatever is the meaning of refcount, the documentation of sys.getrefcount
should be fixed. The doc currently says that
Return the reference count of object.
The count returned is generally one higher than you might expect,
because it includes the (temporary) reference as an argument to
getrefcount().
For immortal objects, it is no more the case, the returned number is not the reference count.
See gh-98154.