cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Statically allocate interpreter states as much as possible.

Open ericsnowcurrently opened this issue 3 years ago • 17 comments

BPO 45953
Nosy @gvanrossum, @vstinner, @markshannon, @ericsnowcurrently, @JulienPalard, @miss-islington, @brandtbucher
PRs
  • python/cpython#29883
  • python/cpython#30092
  • python/cpython#30096
  • python/cpython#30588
  • python/cpython#30589
  • python/cpython#30590
  • python/cpython#31038
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericsnowcurrently'
    closed_at = None
    created_at = <Date 2021-12-01.18:50:39.120>
    labels = ['interpreter-core', '3.11']
    title = 'Statically allocate interpreter states as much as possible.'
    updated_at = <Date 2022-03-29.07:04:09.685>
    user = 'https://github.com/ericsnowcurrently'
    

    bugs.python.org fields:

    activity = <Date 2022-03-29.07:04:09.685>
    actor = 'mdk'
    assignee = 'eric.snow'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2021-12-01.18:50:39.120>
    creator = 'eric.snow'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45953
    keywords = ['patch']
    message_count = 15.0
    messages = ['407476', '408507', '410315', '410441', '410536', '410537', '412046', '412213', '412291', '416159', '416161', '416170', '416208', '416210', '416245']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'vstinner', 'Mark.Shannon', 'eric.snow', 'mdk', 'miss-islington', 'brandtbucher']
    pr_nums = ['29883', '30092', '30096', '30588', '30589', '30590', '31038']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue45953'
    versions = ['Python 3.11']
    

    ericsnowcurrently avatar Dec 01 '21 18:12 ericsnowcurrently

    Currently we allocate each new PyInterpreterState in PyInterpreterState_New(). Furthermore, PyInterpreterState is full of pointers which are almost all allocated on the heap during runtime init. We can statically allocate (and initialize?) most of what goes into PyInterpreterState, as well as the main interpreter itself (as part of _PyRuntimeState). This includes each interpreter's initial PyThreadState.

    TODO:

    • [x] add PyInterpreterState main; to _PyRuntimeState.interpreters and use it
    • [ ] change references from the pointer to the value
    • [x] add PyThreadState _main; to PyInterpreterState.threads and use it
    • [ ] change references from the pointer to the value
    • [ ] change PyInterpreterState pointer fields to values (as much as possible)
    • [ ] change PyThreadState pointer fields to values (as much as possible)

    benefits:

    • fewer possible failures (no memory) during runtime/interpreter/thread init
    • improved memory locality for pointers looked up relative to interpreter/thread state

    There is one non-trivial bit: embedding the various PyObject values in PyInterpreterState and PyThreadState means hard-coding the various pieces of the object there (e.g. for dict, its keys/values; for list, its array), as well as adding necessary init code to PyInterpreterState_New() and PyThreadState_New(). The resulting added complexity can be mitigated somewhat with macros or even code generation. (In fact, there is probably significant overlap with Guido's deepfreeze tool.) Regardless, we'll probably need to factor out init funcs for a number of object types, where currently there are only "Py*_New()" funcs that combine allocation and init.

    ericsnowcurrently avatar Dec 01 '21 18:12 ericsnowcurrently

    New changeset 121f1f893a39d0b58d3d2b5597505c154ecaac2a by Eric Snow in branch 'main': bpo-45953: Statically initialize the small ints. (gh-30092) https://github.com/python/cpython/commit/121f1f893a39d0b58d3d2b5597505c154ecaac2a

    ericsnowcurrently avatar Dec 14 '21 01:12 ericsnowcurrently

    New changeset cf496d657a1a82eaf9ebfb47d721676fef6effa5 by Eric Snow in branch 'main': bpo-45953: Statically allocate and initialize global bytes objects. (gh-30096) https://github.com/python/cpython/commit/cf496d657a1a82eaf9ebfb47d721676fef6effa5

    ericsnowcurrently avatar Jan 11 '22 16:01 ericsnowcurrently

    New changeset ed57b36c32e521162dbb97199e64a340d3bff827 by Eric Snow in branch 'main': bpo-45953: Statically allocate the main interpreter (and initial thread state). (gh-29883) https://github.com/python/cpython/commit/ed57b36c32e521162dbb97199e64a340d3bff827

    ericsnowcurrently avatar Jan 12 '22 23:01 ericsnowcurrently

    New changeset 324908ba936d5d262026deebb81f050803848c41 by Eric Snow in branch 'main': bpo-45953: Statically initialize all the PyThreadState fields we can. (gh-30590) https://github.com/python/cpython/commit/324908ba936d5d262026deebb81f050803848c41

    ericsnowcurrently avatar Jan 14 '22 00:01 ericsnowcurrently

    New changeset 322f962f3ee31d0dbde99e36379de8488ccc6804 by Eric Snow in branch 'main': bpo-45953: Statically initialize all the non-object PyInterpreterState fields we can. (gh-30589) https://github.com/python/cpython/commit/322f962f3ee31d0dbde99e36379de8488ccc6804

    ericsnowcurrently avatar Jan 14 '22 00:01 ericsnowcurrently

    Any chance we could revert the recent renaming of tstate.exc_state and tstate.root_cframe in https://github.com/python/cpython/pull/30590? It broke Greenlet again:

    https://github.com/python-greenlet/greenlet/issues/288

    If it's only a name change (and the members themselves are the same), I think reverting it is preferable to burying Greenlet in more compatibility macros and bugging them to put out another new release.

    brandtbucher avatar Jan 29 '22 00:01 brandtbucher

    Any chance we could revert the recent renaming of tstate.exc_state and tstate.root_cframe

    Yeah, I'll sort this out. Sorry for that.

    ericsnowcurrently avatar Jan 31 '22 17:01 ericsnowcurrently

    New changeset f78be59c83c151d94902daef56218530c52e29e7 by Eric Snow in branch 'main': bpo-45953: Preserve backward compatibility on some PyThreadState field names. (GH-31038) https://github.com/python/cpython/commit/f78be59c83c151d94902daef56218530c52e29e7

    miss-islington avatar Feb 01 '22 17:02 miss-islington

    Since https://github.com/python/cpython/commit/121f1f893a39d0b58d3d2b5597505c154ecaac2a, sys.getrefcount(1) is surprising:

        >>> __import__("sys").getrefcount(1)
        1000000210
    

    Should sys.getrefcount try to "fix" the value like by returning PyREFCNT(object) % 999999999? (But using a define to avoid the "magic, copy/pasted value").

    JulienPalard avatar Mar 28 '22 11:03 JulienPalard

    Should sys.getrefcount try to "fix" the value (...)

    https://peps.python.org/pep-0683/ would make it possible. Right now, I don't think that it's possible.

    Right now, a refcount of 1000000210 can be a real value, or it can be an immortal object.

    vstinner avatar Mar 28 '22 12:03 vstinner

    Please don’t try to “fix” anything. The value is only useful if you understand the implementation. It should map straightforwardly to what’s in memory.

    On Mon, Mar 28, 2022 at 05:16 STINNER Victor <[email protected]> wrote:

    STINNER Victor <[email protected]> added the comment:

    > Should sys.getrefcount try to "fix" the value (...)

    https://peps.python.org/pep-0683/ would make it possible. Right now, I don't think that it's possible.

    Right now, a refcount of 1000000210 can be a real value, or it can be an immortal object.

    ----------


    Python tracker <[email protected]> <https://bugs.python.org/issue45953>


    -- --Guido (mobile)

    gvanrossum avatar Mar 28 '22 14:03 gvanrossum

    Hum, and why 999999999? I am probably missing something obvious but 1 should be enough to ensure the value never hits 0. Except for refcount bugs obviously, but I don't think this is the right reason?

    JulienPalard avatar Mar 28 '22 20:03 JulienPalard

    I used 999999999 in deepfreeze.py to signify "immortal object". It has been copied by others (small integers are essentially immortal too). I wasn't too sure that the refcount wouldn't go below zero if the interpreter is repeatedly finalized and reinitialized. Once we have official immortal objects (see PEP-683) we should switch to that.

    Since you seem to be challenging the value of 999999999, my question to you is, why do you care what the refcount of 1 is?

    gvanrossum avatar Mar 28 '22 20:03 gvanrossum

    Since you seem to be challenging the value of 999999999, my question to you is, why do you care what the refcount of 1 is?

    Yesterday I was teaching Python, and we were speaking of integer immutability, names being "labels to objects" and so on, and I was showing the memory layout of all of this by hand on a whiteboard while "prooving" my drawings using an interpreter.

    While doing so came a question like "So, many modules can use the object int(1)?" So I answered yes, told that I expected many reuse of 1, and went importing sys.getrefcount to show them.

    And boom, it printed 1000000209 so I bugged for a few seconds, the value was obviously not the real refcount, and was also obviously bumped by a constant like 100000000, so I went inspecting why and found this commit.

    I have nothing against keeping 999999999, but in the other hand it could surprise other people, maybe we should at least document it near sys.getrefcount.

    JulienPalard avatar Mar 29 '22 07:03 JulienPalard

    This looks completed, @ericsnowcurrently can this be closed now?

    kumaraditya303 avatar Nov 08 '22 08:11 kumaraditya303

    We should add a note in the docs, as @JulienPalard recommended. Arguably, that should be part of https://github.com/python/cpython/issues/98154.

    Otherwise, I'll take a look soon to see if there's anything else left to do.

    ericsnowcurrently avatar Nov 08 '22 16:11 ericsnowcurrently

    Following @JulienPalard comments, I also faced the same issue during a Python course. Whatever is the meaning of refcount, the documentation of sys.getrefcount should be fixed. The doc currently says that

    Return the reference count of object.
    
    The count returned is generally one higher than you might expect,
    because it includes the (temporary) reference as an argument to
    getrefcount().
    

    For immortal objects, it is no more the case, the returned number is not the reference count.

    arnaudlegout avatar May 24 '23 18:05 arnaudlegout

    See gh-98154.

    ericsnowcurrently avatar May 24 '23 20:05 ericsnowcurrently