cpython icon indicating copy to clipboard operation
cpython copied to clipboard

JIT: Implement `unique reference tracking` in Tier 2 for reference count optimizations

Open cocolato opened this issue 2 weeks ago • 7 comments

Feature or enhancement

Proposal:

Motivation

We should implement unique reference tracking in Tier 2 to facilitate optimizations that reduce reference counting overhead. For example, when a tuple is known to be uniquely referenced, we can "steal" its element references during unpacking without performing any reference counting operations.

For reference: discussion in https://github.com/python/cpython/pull/142952

Technical Approach

  1. Reference Tracking Infrastructure
    • Add an REF_IS_UNIQUE bit (bit 1) to the JitOptRef union in pycore_optimizer.h (code reference).
    • Implement PyJitRef_MakeUnique() and PyJitRef_IsUnique() helper functions.
    • Update helper utilities including PyJitRef_StripReferenceInfo and JIT_BITS_TO_PTR_MASKED to support this unique reference bit.
  2. Apply unique reference tracking to UNPACK_SEQUENCE uops
  3. Expand support to more uops
    • After verifying performance and correctness, extend the use of unique reference tracking to additional uops and optimizations as identified.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

cocolato avatar Jan 04 '26 11:01 cocolato

@Fidget-Spinner Hi, Could you please review this issue and let me know your thoughts?

cocolato avatar Jan 04 '26 11:01 cocolato

@cocolato seems correct to me. The key thing about the optimization is that

        op(_UNPACK_SEQUENCE_TWO_TUPLE, (seq -- val1, val0)) {
            assert(oparg == 2);
            PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
            assert(PyTuple_CheckExact(seq_o));
            DEOPT_IF(PyTuple_GET_SIZE(seq_o) != 2);
            STAT_INC(UNPACK_SEQUENCE, hit);
            val0 = PyStackRef_FromPyObjectNew(PyTuple_GET_ITEM(seq_o, 0));
            val1 = PyStackRef_FromPyObjectNew(PyTuple_GET_ITEM(seq_o, 1));
            PyStackRef_CLOSE(seq);
        }

becomes

        op(_UNPACK_SEQUENCE_TWO_TUPLE_UNIQUE_STEAL, (seq -- val1, val0)) {
            assert(oparg == 2);
            PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
            assert(PyTuple_CheckExact(seq_o));
            DEOPT_IF(PyTuple_GET_SIZE(seq_o) != 2);
            STAT_INC(UNPACK_SEQUENCE, hit);
            val0 = PyStackRef_FromPyObjectSteal(PyTuple_GET_ITEM(seq_o, 0));
            PyTuple_SET_ITEM(seq_o, 0, NULL);
            val1 = PyStackRef_FromPyObjectSteal(PyTuple_GET_ITEM(seq_o, 1));
            PyTuple_SET_ITEM(seq_o, 1, NULL);
            PyStackRef_CLOSE_NO_ESCAPE(seq);
        }

Which will allow the op to have no reference counts operations at all, and also be non-escaping. So we can stack cache over it.

Fidget-Spinner avatar Jan 04 '26 13:01 Fidget-Spinner

Ok, i will work on this.

cocolato avatar Jan 04 '26 13:01 cocolato

@cocolato now that I think about this, this is more complicated than I thought:

The problem is you have to invalidate all unique references on any escaping uop, like see (_PyUop_Flags[opcode] & HAS_ESCAPES_FLAG). RETURN_VALUE is an escaping uop, so we cannot easily apply this optimization across RETURN_VALUE.

However, I think there's one useful place you can still apply this optimization: object creation and initialization. See the CALL_ALLOC_AND_ENTER_INIT instruction for example.

I think because of how complicated this is, I'll take over, sorry! However, I need your help on making this optimization better and can parallelize some work here: we need to specialize on more forms of object creation. Can you please add a specialization for CALL_SLOT_AND_ENTER_INIT?

Basically the current code JITs:

class A:
    def __init__(self, a):
        self.a = a


def foo(n):
    for i in range(1, n + 1):
        x = A(1)
    return 1

foo(4003)

But once you add __slots__ to A, there's no more specialization, and the JIT cannot optimize and just fails. So we need a new specialization for calling __slots__. This turns up frequently in dataclasses and also the bm_float benchmark on pyperformance.

Just by adding CALL_SLOT_AND_ENTER_INIT, you should see a speedup on that benchmark. This isn't an easy task, and I think you're a very capable/strong contributor, so I'm entrusting you with this. Do you mind taking it up? You can see an example of how to do add a specialization here https://github.com/python/cpython/pull/143389

Fidget-Spinner avatar Jan 04 '26 21:01 Fidget-Spinner

Sorry for discouraging you from this optimization btw. I feel with the current state of the JIT, we can't use this (yet). If you manage to implement more specializations, that should make it possible though.

Fidget-Spinner avatar Jan 04 '26 21:01 Fidget-Spinner

We can still do the optimizations, but we may have to insert extra guards. However, that's blocked by https://github.com/python/cpython/issues/143421 as well.

~Let me know which one you want to work on, and I can let you take it up!~ The latter got taken up by Donghee, so it will have to be the new specialization!

Fidget-Spinner avatar Jan 04 '26 21:01 Fidget-Spinner

add a specialization for CALL_SLOT_AND_ENTER_INIT

Thanks for the explanation! I'm pleased to take on the task of adding a specialization for CALL_SLOT_AND_ENTER_INIT. I'm also very willing to help with other related development tasks in the future—Truly I need to start with simpler tasks to get more familiar with this area of optimization. Thanks for the trust!

cocolato avatar Jan 05 '26 02:01 cocolato

But once you add __slots__ to A, there's no more specialization, and the JIT cannot optimize and just fails. So we need a new specialization for calling __slots__.

I'm not sure whether my test case behaves as expected, but on my test machine (Darwin 25.1.0 arm64, clang version 21.1.8) I found that the JIT can still perform specialization with CALL_ALLOC_AND_ENTER_INIT for the following code:

class A:
    __slots__ = ('a',)
    def __init__(self, a):
        self.a = a

def foo(n):
    for i in range(1, n + 1):
        x = A(1)
    return 1

foo(4003)

Run: PYTHON_LLTRACE=4 PYTHON_OPT_DEBUG=4 ./python.exe ./foo.py

Output:

Tracing foo (./foo.py:6) at byte offset 40 at chain depth 0
0x1051c7600 34: JUMP_BACKWARD(16) 0 2
   3 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=34, operand0=0, operand1=0)
   4 ADD_TO_TRACE: _SET_IP (0, target=34, operand0=0x1051c7714, operand1=0)
   5 ADD_TO_TRACE: _CHECK_PERIODIC (0, target=34, operand0=0, operand1=0)
Trace continuing
0x1051c7600 20: FOR_ITER_RANGE(14) 0 2
   6 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=20, operand0=0, operand1=0)
   7 ADD_TO_TRACE: _SET_IP (0, target=20, operand0=0x1051c76f8, operand1=0)
   8 ADD_TO_TRACE: _ITER_CHECK_RANGE (14, target=20, operand0=0, operand1=0)
   9 ADD_TO_TRACE: _GUARD_NOT_EXHAUSTED_RANGE (14, target=37, operand0=0, operand1=0)
  10 ADD_TO_TRACE: _ITER_NEXT_RANGE (14, target=20, operand0=0, operand1=0)
Trace continuing
0x1051c7600 22: STORE_FAST(1) 0 3
  11 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=22, operand0=0, operand1=0)
  12 ADD_TO_TRACE: _SET_IP (0, target=22, operand0=0x1051c76fc, operand1=0)
  13 ADD_TO_TRACE: _SWAP_FAST (1, target=22, operand0=0, operand1=0)
  14 ADD_TO_TRACE: _POP_TOP (1, target=22, operand0=0, operand1=0)
Trace continuing
0x1051c7600 23: LOAD_GLOBAL_MODULE(3) 0 2
  15 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=23, operand0=0, operand1=0)
  16 ADD_TO_TRACE: _SET_IP (0, target=23, operand0=0x1051c76fe, operand1=0)
  17 ADD_TO_TRACE: _NOP (3, target=23, operand0=0, operand1=0)
  18 ADD_TO_TRACE: _LOAD_GLOBAL_MODULE (3, target=23, operand0=0x2c, operand1=0)
  19 ADD_TO_TRACE: _PUSH_NULL_CONDITIONAL (3, target=23, operand0=0, operand1=0)
Trace continuing
0x1051c7600 28: LOAD_SMALL_INT(1) 0 4
  20 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=28, operand0=0, operand1=0)
  21 ADD_TO_TRACE: _SET_IP (0, target=28, operand0=0x1051c7708, operand1=0)
  22 ADD_TO_TRACE: _LOAD_SMALL_INT (1, target=28, operand0=0, operand1=0)
Trace continuing
0x1051c7600 29: CALL_ALLOC_AND_ENTER_INIT(1) 1 5
  23 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=29, operand0=0, operand1=0)
  24 ADD_TO_TRACE: _SET_IP (0, target=29, operand0=0x1051c770a, operand1=0)
  25 ADD_TO_TRACE: _CHECK_PEP_523 (1, target=29, operand0=0, operand1=0)
  26 ADD_TO_TRACE: _CHECK_AND_ALLOCATE_OBJECT (1, target=29, operand0=0x20049, operand1=0)
  27 ADD_TO_TRACE: _CREATE_INIT_FRAME (1, target=29, operand0=0, operand1=0)
Adding 0x1052f4350 func to op
  28 ADD_TO_TRACE: _PUSH_FRAME (1, target=29, operand0=0x1052f4350, operand1=0)
  29 ADD_TO_TRACE: _GUARD_IP__PUSH_FRAME (0, target=0, operand0=0x1051b6410, operand1=0)
Trace continuing
0x1051b6340 0: RESUME_CHECK(0) 0 0
  30 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0)
  31 ADD_TO_TRACE: _SET_IP (0, target=0, operand0=0x1051b6410, operand1=0)
  32 ADD_TO_TRACE: _TIER2_RESUME_CHECK (0, target=0, operand0=0, operand1=0)
Trace continuing
0x1051b6340 1: LOAD_FAST_BORROW_LOAD_FAST_BORROW(16) 0 0
  33 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0)
  34 ADD_TO_TRACE: _SET_IP (0, target=1, operand0=0x1051b6412, operand1=0)
  35 ADD_TO_TRACE: _LOAD_FAST_BORROW (1, target=1, operand0=0, operand1=0)
  36 ADD_TO_TRACE: _LOAD_FAST_BORROW (0, target=1, operand0=0, operand1=0)
Trace continuing
0x1051b6340 2: STORE_ATTR_SLOT(0) 0 2
  37 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=2, operand0=0, operand1=0)
  38 ADD_TO_TRACE: _SET_IP (0, target=2, operand0=0x1051b6414, operand1=0)
  39 ADD_TO_TRACE: _GUARD_TYPE_VERSION (0, target=2, operand0=0x20049, operand1=0)
  40 ADD_TO_TRACE: _STORE_ATTR_SLOT (0, target=2, operand0=0x10, operand1=0)
  41 ADD_TO_TRACE: _POP_TOP (0, target=2, operand0=0, operand1=0)
Trace continuing
0x1051b6340 7: LOAD_CONST(0) 0 0
  42 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=7, operand0=0, operand1=0)
  43 ADD_TO_TRACE: _SET_IP (0, target=7, operand0=0x1051b641e, operand1=0)
  44 ADD_TO_TRACE: _LOAD_CONST (0, target=7, operand0=0, operand1=0)
Trace continuing
0x1051b6340 8: RETURN_VALUE(0) 1 1
  45 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=8, operand0=0, operand1=0)
  46 ADD_TO_TRACE: _SET_IP (0, target=8, operand0=0x1051b6420, operand1=0)
Adding 0x104dece01 code to op
  47 ADD_TO_TRACE: _RETURN_VALUE (0, target=8, operand0=0x104dece01, operand1=0)
  48 ADD_TO_TRACE: _GUARD_IP_RETURN_VALUE (0, target=0, operand0=0x104deced0, operand1=0)
Trace continuing
0x104dece00 0: EXIT_INIT_CHECK(0) 0 2
  49 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0)
  50 ADD_TO_TRACE: _SET_IP (0, target=0, operand0=0x104deced0, operand1=0)
  51 ADD_TO_TRACE: _EXIT_INIT_CHECK (0, target=0, operand0=0, operand1=0)
Trace continuing
0x104dece00 1: RETURN_VALUE(0) 1 1
  52 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0)
  53 ADD_TO_TRACE: _SET_IP (0, target=1, operand0=0x104deced2, operand1=0)
Adding 0x10526a450 func to op
  54 ADD_TO_TRACE: _RETURN_VALUE (0, target=1, operand0=0x10526a450, operand1=0)
  55 ADD_TO_TRACE: _GUARD_IP_RETURN_VALUE (0, target=0, operand0=0x1051c7712, operand1=0)
Trace continuing
0x1051c7600 33: STORE_FAST(2) 0 3
  56 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=33, operand0=0, operand1=0)
  57 ADD_TO_TRACE: _SET_IP (0, target=33, operand0=0x1051c7712, operand1=0)
  58 ADD_TO_TRACE: _SWAP_FAST (2, target=33, operand0=0, operand1=0)
  59 ADD_TO_TRACE: _POP_TOP (2, target=33, operand0=0, operand1=0)
  60 ADD_TO_TRACE: _JUMP_TO_TOP (0, target=0, operand0=0, operand1=0)
Trace done
   0 abs: _START_EXECUTOR (0, target=34, operand0=0x1051c7714, operand1=0)  stack_level 2
   1 abs: _MAKE_WARM (0, target=0, operand0=0, operand1=0)  stack_level 2
   2 abs: _CHECK_VALIDITY (0, target=34, operand0=0, operand1=0)  stack_level 2
   3 abs: _SET_IP (0, target=34, operand0=0x1051c7714, operand1=0)  stack_level 2
   4 abs: _CHECK_PERIODIC (0, target=34, operand0=0, operand1=0)  stack_level 2
   5 abs: _CHECK_VALIDITY (0, target=20, operand0=0, operand1=0)  stack_level 2
   6 abs: _SET_IP (0, target=20, operand0=0x1051c76f8, operand1=0)  stack_level 2
   7 abs: _ITER_CHECK_RANGE (14, target=20, operand0=0, operand1=0)  stack_level 2
   8 abs: _GUARD_NOT_EXHAUSTED_RANGE (14, target=37, operand0=0, operand1=0)  stack_level 2
   9 abs: _ITER_NEXT_RANGE (14, target=20, operand0=0, operand1=0)  stack_level 3
  10 abs: _CHECK_VALIDITY (0, target=22, operand0=0, operand1=0)  stack_level 3
  11 abs: _SET_IP (0, target=22, operand0=0x1051c76fc, operand1=0)  stack_level 3
  12 abs: _SWAP_FAST (1, target=22, operand0=0, operand1=0)  stack_level 3
  13 abs: _POP_TOP (1, target=22, operand0=0, operand1=0)  stack_level 2
  14 abs: _CHECK_VALIDITY (0, target=23, operand0=0, operand1=0)  stack_level 2
  15 abs: _SET_IP (0, target=23, operand0=0x1051c76fe, operand1=0)  stack_level 2
  16 abs: _NOP (3, target=23, operand0=0, operand1=0)  stack_level 2
  17 abs: _LOAD_GLOBAL_MODULE (3, target=23, operand0=0x2c, operand1=0x7)  stack_level 3
  18 abs: _PUSH_NULL_CONDITIONAL (3, target=23, operand0=0, operand1=0)  stack_level 4
  19 abs: _CHECK_VALIDITY (0, target=28, operand0=0, operand1=0)  stack_level 4
  20 abs: _SET_IP (0, target=28, operand0=0x1051c7708, operand1=0)  stack_level 4
  21 abs: _LOAD_SMALL_INT (1, target=28, operand0=0, operand1=0)  stack_level 5
  22 abs: _CHECK_VALIDITY (0, target=29, operand0=0, operand1=0)  stack_level 5
  23 abs: _SET_IP (0, target=29, operand0=0x1051c770a, operand1=0)  stack_level 5
  24 abs: _CHECK_PEP_523 (1, target=29, operand0=0, operand1=0)  stack_level 5
  25 abs: _CHECK_AND_ALLOCATE_OBJECT (1, target=29, operand0=0x20049, operand1=0)  stack_level 5
  26 abs: _CREATE_INIT_FRAME (1, target=29, operand0=0, operand1=0) func=0x1052f4350 code=0x1051b6340   27 abs: _PUSH_FRAME (1, target=29, operand0=0x1052f4350, operand1=0)  stack_level 0
  28 abs: _NOP (0, target=0, operand0=0, operand1=0)  stack_level 0
  29 abs: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0)  stack_level 0
  30 abs: _SET_IP (0, target=0, operand0=0x1051b6410, operand1=0)  stack_level 0
  31 abs: _TIER2_RESUME_CHECK (0, target=0, operand0=0, operand1=0)  stack_level 0
  32 abs: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0)  stack_level 0
  33 abs: _SET_IP (0, target=1, operand0=0x1051b6412, operand1=0)  stack_level 0
  34 abs: _LOAD_FAST_BORROW (1, target=1, operand0=0, operand1=0)  stack_level 1
  35 abs: _LOAD_FAST_BORROW (0, target=1, operand0=0, operand1=0)  stack_level 2
  36 abs: _CHECK_VALIDITY (0, target=2, operand0=0, operand1=0)  stack_level 2
  37 abs: _SET_IP (0, target=2, operand0=0x1051b6414, operand1=0)  stack_level 2
  38 abs: _GUARD_TYPE_VERSION (0, target=2, operand0=0x20049, operand1=0)  stack_level 2
  39 abs: _STORE_ATTR_SLOT (0, target=2, operand0=0x10, operand1=0)  stack_level 1
  40 abs: _POP_TOP (0, target=2, operand0=0, operand1=0)  stack_level 0
  41 abs: _CHECK_VALIDITY (0, target=7, operand0=0, operand1=0)  stack_level 0
  42 abs: _SET_IP (0, target=7, operand0=0x1051b641e, operand1=0)  stack_level 0
  43 abs: _LOAD_CONST (0, target=7, operand0=0, operand1=0)  stack_level 1
  44 abs: _CHECK_VALIDITY (0, target=8, operand0=0, operand1=0)  stack_level 1
  45 abs: _SET_IP (0, target=8, operand0=0x1051b6420, operand1=0)  stack_level 1
  46 abs: _RETURN_VALUE (0, target=8, operand0=0x104dece01, operand1=0x2) code=0x104dece00   47 abs: _NOP (0, target=0, operand0=0, operand1=0)   48 abs: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0)   49 abs: _SET_IP (0, target=0, operand0=0x104deced0, operand1=0)   50 abs: _EXIT_INIT_CHECK (0, target=0, operand0=0, operand1=0)   51 abs: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0)   52 abs: _SET_IP (0, target=1, operand0=0x104deced2, operand1=0)   53 abs: _RETURN_VALUE (0, target=1, operand0=0x10526a450, operand1=0x3) func=0x10526a450 code=0x1051c7600  stack_level 3
  54 abs: _NOP (0, target=0, operand0=0, operand1=0)  stack_level 3
  55 abs: _CHECK_VALIDITY (0, target=33, operand0=0, operand1=0)  stack_level 3
  56 abs: _SET_IP (0, target=33, operand0=0x1051c7712, operand1=0)  stack_level 3
  57 abs: _SWAP_FAST (2, target=33, operand0=0, operand1=0)  stack_level 3
  58 abs: _POP_TOP (2, target=33, operand0=0, operand1=0)  stack_level 2
  59 abs: _JUMP_TO_TOP (0, target=0, operand0=0, operand1=0)  stack_level 2
Optimized trace (length 64):
   0 OPTIMIZED: _START_EXECUTOR_r00 (0, jump_target=46, operand0=0xaa50fb830, operand1=0)
   1 OPTIMIZED: _MAKE_WARM_r00 (0, target=0, operand0=0, operand1=0)
   2 OPTIMIZED: _SET_IP_r00 (0, target=34, operand0=0x1051c7714, operand1=0)
   3 OPTIMIZED: _CHECK_PERIODIC_r00 (0, jump_target=0, operand0=0, operand1=0, error_target=47)
   4 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=48, operand0=0, operand1=0)
   5 OPTIMIZED: _ITER_CHECK_RANGE_r02 (14, jump_target=49, operand0=0, operand1=0)
   6 OPTIMIZED: _GUARD_NOT_EXHAUSTED_RANGE_r22 (14, jump_target=50, operand0=0, operand1=0)
   7 OPTIMIZED: _ITER_NEXT_RANGE_r23 (14, jump_target=0, operand0=0, operand1=0, error_target=51)
   8 OPTIMIZED: _SET_IP_r33 (0, target=22, operand0=0x1051c76fc, operand1=0)
   9 OPTIMIZED: _SWAP_FAST_1_r33 (1, target=22, operand0=0, operand1=0)
  10 OPTIMIZED: _SPILL_OR_RELOAD_r31 (0, target=0, operand0=0, operand1=0)
  11 OPTIMIZED: _POP_TOP_r10 (1, target=22, operand0=0, operand1=0)
  12 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=52, operand0=0, operand1=0)
  13 OPTIMIZED: _GUARD_GLOBALS_VERSION_r00 (0, jump_target=52, operand0=0x2c, operand1=0)
  14 OPTIMIZED: _LOAD_CONST_INLINE_r01 (3, target=23, operand0=0xaa510b730, operand1=0x7)
  15 OPTIMIZED: _PUSH_NULL_r12 (0, target=23, operand0=0, operand1=0)
  16 OPTIMIZED: _LOAD_CONST_INLINE_BORROW_r23 (0, target=28, operand0=0x104e381d0, operand1=0)
  17 OPTIMIZED: _SET_IP_r33 (0, target=29, operand0=0x1051c770a, operand1=0)
  18 OPTIMIZED: _SPILL_OR_RELOAD_r30 (0, target=0, operand0=0, operand1=0)
  19 OPTIMIZED: _CHECK_AND_ALLOCATE_OBJECT_r00 (1, jump_target=53, operand0=0x20049, operand1=0, error_target=54)
  20 OPTIMIZED: _CREATE_INIT_FRAME_r01 (1, jump_target=0, operand0=0, operand1=0, error_target=54)
  21 OPTIMIZED: _PUSH_FRAME_r10 (1, target=29, operand0=0x1052f4350, operand1=0)
  22 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=55, operand0=0, operand1=0)
  23 OPTIMIZED: _TIER2_RESUME_CHECK_r00 (0, jump_target=56, operand0=0, operand1=0)
  24 OPTIMIZED: _LOAD_FAST_BORROW_1_r01 (1, target=1, operand0=0, operand1=0)
  25 OPTIMIZED: _LOAD_FAST_BORROW_0_r12 (0, target=1, operand0=0, operand1=0)
  26 OPTIMIZED: _SET_IP_r22 (0, target=2, operand0=0x1051b6414, operand1=0)
  27 OPTIMIZED: _GUARD_TYPE_VERSION_r22 (0, jump_target=57, operand0=0x20049, operand1=0)
  28 OPTIMIZED: _STORE_ATTR_SLOT_r21 (0, jump_target=58, operand0=0x10, operand1=0)
  29 OPTIMIZED: _POP_TOP_NOP_r10 (0, target=2, operand0=0, operand1=0)
  30 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=59, operand0=0, operand1=0)
  31 OPTIMIZED: _LOAD_CONST_INLINE_BORROW_r01 (0, target=7, operand0=0x104e06b90, operand1=0)
  32 OPTIMIZED: _SET_IP_r11 (0, target=8, operand0=0x1051b6420, operand1=0)
  33 OPTIMIZED: _RETURN_VALUE_r11 (0, target=8, operand0=0x104dece01, operand1=0x2)
  34 OPTIMIZED: _CHECK_VALIDITY_r11 (0, jump_target=60, operand0=0, operand1=0)
  35 OPTIMIZED: _SET_IP_r11 (0, target=0, operand0=0x104deced0, operand1=0)
  36 OPTIMIZED: _EXIT_INIT_CHECK_r10 (0, jump_target=0, operand0=0, operand1=0, error_target=61)
  37 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=62, operand0=0, operand1=0)
  38 OPTIMIZED: _SET_IP_r00 (0, target=1, operand0=0x104deced2, operand1=0)
  39 OPTIMIZED: _SPILL_OR_RELOAD_r01 (0, target=0, operand0=0, operand1=0)
  40 OPTIMIZED: _RETURN_VALUE_r11 (0, target=1, operand0=0x10526a450, operand1=0x3)
  41 OPTIMIZED: _CHECK_VALIDITY_r11 (0, jump_target=63, operand0=0, operand1=0)
  42 OPTIMIZED: _SET_IP_r11 (0, target=33, operand0=0x1051c7712, operand1=0)
  43 OPTIMIZED: _SWAP_FAST_2_r11 (2, target=33, operand0=0, operand1=0)
  44 OPTIMIZED: _POP_TOP_r10 (2, target=33, operand0=0, operand1=0)
  45 OPTIMIZED: _JUMP_TO_TOP_r00 (0, jump_target=1, operand0=0, operand1=0)
  46 OPTIMIZED: _DEOPT_r00 (0, target=34, operand0=0, operand1=0)
  47 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0x22, operand1=0)
  48 OPTIMIZED: _DEOPT_r00 (0, target=20, operand0=0, operand1=0)
  49 OPTIMIZED: _EXIT_TRACE_r00 (0, target=20, operand0=0xaa50fb8b0, operand1=0)
  50 OPTIMIZED: _EXIT_TRACE_r20 (0, target=37, operand0=0xaa50fb8c0, operand1=0x1)
  51 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0x14, operand1=0)
  52 OPTIMIZED: _DEOPT_r00 (0, target=23, operand0=0, operand1=0)
  53 OPTIMIZED: _DEOPT_r00 (0, target=29, operand0=0, operand1=0)
  54 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0x1d, operand1=0)
  55 OPTIMIZED: _DEOPT_r00 (0, target=0, operand0=0, operand1=0)
  56 OPTIMIZED: _HANDLE_PENDING_AND_DEOPT_r00 (0, target=0, operand0=0, operand1=0)
  57 OPTIMIZED: _EXIT_TRACE_r20 (0, target=2, operand0=0xaa50fb8d0, operand1=0)
  58 OPTIMIZED: _DEOPT_r20 (0, target=2, operand0=0, operand1=0)
  59 OPTIMIZED: _DEOPT_r00 (0, target=7, operand0=0, operand1=0)
  60 OPTIMIZED: _DEOPT_r10 (0, target=0, operand0=0, operand1=0)
  61 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0, operand1=0)
  62 OPTIMIZED: _DEOPT_r00 (0, target=1, operand0=0, operand1=0)
  63 OPTIMIZED: _DEOPT_r10 (0, target=33, operand0=0, operand1=0)

cocolato avatar Jan 06 '26 03:01 cocolato

@cocolato I'm surprised, you're right. Really sorry, my mistake. I have another issue to work on. Can you please work on making the JIT optimization state per-thread?

https://github.com/python/cpython/issues/143421#issuecomment-3710690408

The main one is JitOptContext is stack allocated https://github.com/python/cpython/blob/main/Python/optimizer_analysis.c#L342

We need it to be pushed to _PyThreadStateImpl https://github.com/python/cpython/blob/main/Include/internal/pycore_tstate.h#L149

So that we don't stack overflow on the JIT. This time I've verified that the problem indeed exists. Do you want to take this one up?

Fidget-Spinner avatar Jan 06 '26 10:01 Fidget-Spinner

Ok, I will take some time to understand the issue and work on it!

cocolato avatar Jan 06 '26 10:01 cocolato

Please don't work on the parent issue (Make the JIT optimizer buffer add to a new buffer, not in-place), instead it's the sub-issue, thanks!

Fidget-Spinner avatar Jan 06 '26 10:01 Fidget-Spinner

@Fidget-Spinner Hi, are there some other tasks we can do for this issue?

cocolato avatar Jan 08 '26 11:01 cocolato

@cocolato We need to first implement a new symbolic type for objects. See optimizer_symbols.c and the things that are done there. I can let you take that up if you want.

Fidget-Spinner avatar Jan 08 '26 22:01 Fidget-Spinner

I will learn and try this part :)

cocolato avatar Jan 09 '26 05:01 cocolato

@cocolato can I assign you this to do before that though? We first need the buffer to append instead of overwrite in place if we want to do this optimization properly.

https://github.com/python/cpython/issues/143421

Fidget-Spinner avatar Jan 09 '26 06:01 Fidget-Spinner

@cocolato can I assign you this to do before that though? We first need the buffer to append instead of overwrite in place if we want to do this optimization properly.

#143421

Sure, I will work on this!

cocolato avatar Jan 09 '26 07:01 cocolato