JIT: Implement `unique reference tracking` in Tier 2 for reference count optimizations
Feature or enhancement
Proposal:
Motivation
We should implement unique reference tracking in Tier 2 to facilitate optimizations that reduce reference counting overhead. For example, when a tuple is known to be uniquely referenced, we can "steal" its element references during unpacking without performing any reference counting operations.
For reference: discussion in https://github.com/python/cpython/pull/142952
Technical Approach
-
Reference Tracking Infrastructure
- Add an
REF_IS_UNIQUEbit (bit 1) to theJitOptRefunion inpycore_optimizer.h(code reference). - Implement
PyJitRef_MakeUnique()andPyJitRef_IsUnique()helper functions. - Update helper utilities including
PyJitRef_StripReferenceInfoandJIT_BITS_TO_PTR_MASKEDto support this unique reference bit.
- Add an
- Apply unique reference tracking to UNPACK_SEQUENCE uops
-
Expand support to more uops
- After verifying performance and correctness, extend the use of unique reference tracking to additional uops and optimizations as identified.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
@Fidget-Spinner Hi, Could you please review this issue and let me know your thoughts?
@cocolato seems correct to me. The key thing about the optimization is that
op(_UNPACK_SEQUENCE_TWO_TUPLE, (seq -- val1, val0)) {
assert(oparg == 2);
PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
assert(PyTuple_CheckExact(seq_o));
DEOPT_IF(PyTuple_GET_SIZE(seq_o) != 2);
STAT_INC(UNPACK_SEQUENCE, hit);
val0 = PyStackRef_FromPyObjectNew(PyTuple_GET_ITEM(seq_o, 0));
val1 = PyStackRef_FromPyObjectNew(PyTuple_GET_ITEM(seq_o, 1));
PyStackRef_CLOSE(seq);
}
becomes
op(_UNPACK_SEQUENCE_TWO_TUPLE_UNIQUE_STEAL, (seq -- val1, val0)) {
assert(oparg == 2);
PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
assert(PyTuple_CheckExact(seq_o));
DEOPT_IF(PyTuple_GET_SIZE(seq_o) != 2);
STAT_INC(UNPACK_SEQUENCE, hit);
val0 = PyStackRef_FromPyObjectSteal(PyTuple_GET_ITEM(seq_o, 0));
PyTuple_SET_ITEM(seq_o, 0, NULL);
val1 = PyStackRef_FromPyObjectSteal(PyTuple_GET_ITEM(seq_o, 1));
PyTuple_SET_ITEM(seq_o, 1, NULL);
PyStackRef_CLOSE_NO_ESCAPE(seq);
}
Which will allow the op to have no reference counts operations at all, and also be non-escaping. So we can stack cache over it.
Ok, i will work on this.
@cocolato now that I think about this, this is more complicated than I thought:
The problem is you have to invalidate all unique references on any escaping uop, like see (_PyUop_Flags[opcode] & HAS_ESCAPES_FLAG). RETURN_VALUE is an escaping uop, so we cannot easily apply this optimization across RETURN_VALUE.
However, I think there's one useful place you can still apply this optimization: object creation and initialization. See the CALL_ALLOC_AND_ENTER_INIT instruction for example.
I think because of how complicated this is, I'll take over, sorry! However, I need your help on making this optimization better and can parallelize some work here: we need to specialize on more forms of object creation. Can you please add a specialization for CALL_SLOT_AND_ENTER_INIT?
Basically the current code JITs:
class A:
def __init__(self, a):
self.a = a
def foo(n):
for i in range(1, n + 1):
x = A(1)
return 1
foo(4003)
But once you add __slots__ to A, there's no more specialization, and the JIT cannot optimize and just fails. So we need a new specialization for calling __slots__. This turns up frequently in dataclasses and also the bm_float benchmark on pyperformance.
Just by adding CALL_SLOT_AND_ENTER_INIT, you should see a speedup on that benchmark. This isn't an easy task, and I think you're a very capable/strong contributor, so I'm entrusting you with this. Do you mind taking it up? You can see an example of how to do add a specialization here https://github.com/python/cpython/pull/143389
Sorry for discouraging you from this optimization btw. I feel with the current state of the JIT, we can't use this (yet). If you manage to implement more specializations, that should make it possible though.
We can still do the optimizations, but we may have to insert extra guards. However, that's blocked by https://github.com/python/cpython/issues/143421 as well.
~Let me know which one you want to work on, and I can let you take it up!~ The latter got taken up by Donghee, so it will have to be the new specialization!
add a specialization for
CALL_SLOT_AND_ENTER_INIT
Thanks for the explanation! I'm pleased to take on the task of adding a specialization for CALL_SLOT_AND_ENTER_INIT. I'm also very willing to help with other related development tasks in the future—Truly I need to start with simpler tasks to get more familiar with this area of optimization. Thanks for the trust!
But once you add
__slots__toA, there's no more specialization, and the JIT cannot optimize and just fails. So we need a new specialization for calling__slots__.
I'm not sure whether my test case behaves as expected, but on my test machine (Darwin 25.1.0 arm64, clang version 21.1.8) I found that the JIT can still perform specialization with CALL_ALLOC_AND_ENTER_INIT for the following code:
class A:
__slots__ = ('a',)
def __init__(self, a):
self.a = a
def foo(n):
for i in range(1, n + 1):
x = A(1)
return 1
foo(4003)
Run: PYTHON_LLTRACE=4 PYTHON_OPT_DEBUG=4 ./python.exe ./foo.py
Output:
Tracing foo (./foo.py:6) at byte offset 40 at chain depth 0
0x1051c7600 34: JUMP_BACKWARD(16) 0 2
3 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=34, operand0=0, operand1=0)
4 ADD_TO_TRACE: _SET_IP (0, target=34, operand0=0x1051c7714, operand1=0)
5 ADD_TO_TRACE: _CHECK_PERIODIC (0, target=34, operand0=0, operand1=0)
Trace continuing
0x1051c7600 20: FOR_ITER_RANGE(14) 0 2
6 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=20, operand0=0, operand1=0)
7 ADD_TO_TRACE: _SET_IP (0, target=20, operand0=0x1051c76f8, operand1=0)
8 ADD_TO_TRACE: _ITER_CHECK_RANGE (14, target=20, operand0=0, operand1=0)
9 ADD_TO_TRACE: _GUARD_NOT_EXHAUSTED_RANGE (14, target=37, operand0=0, operand1=0)
10 ADD_TO_TRACE: _ITER_NEXT_RANGE (14, target=20, operand0=0, operand1=0)
Trace continuing
0x1051c7600 22: STORE_FAST(1) 0 3
11 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=22, operand0=0, operand1=0)
12 ADD_TO_TRACE: _SET_IP (0, target=22, operand0=0x1051c76fc, operand1=0)
13 ADD_TO_TRACE: _SWAP_FAST (1, target=22, operand0=0, operand1=0)
14 ADD_TO_TRACE: _POP_TOP (1, target=22, operand0=0, operand1=0)
Trace continuing
0x1051c7600 23: LOAD_GLOBAL_MODULE(3) 0 2
15 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=23, operand0=0, operand1=0)
16 ADD_TO_TRACE: _SET_IP (0, target=23, operand0=0x1051c76fe, operand1=0)
17 ADD_TO_TRACE: _NOP (3, target=23, operand0=0, operand1=0)
18 ADD_TO_TRACE: _LOAD_GLOBAL_MODULE (3, target=23, operand0=0x2c, operand1=0)
19 ADD_TO_TRACE: _PUSH_NULL_CONDITIONAL (3, target=23, operand0=0, operand1=0)
Trace continuing
0x1051c7600 28: LOAD_SMALL_INT(1) 0 4
20 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=28, operand0=0, operand1=0)
21 ADD_TO_TRACE: _SET_IP (0, target=28, operand0=0x1051c7708, operand1=0)
22 ADD_TO_TRACE: _LOAD_SMALL_INT (1, target=28, operand0=0, operand1=0)
Trace continuing
0x1051c7600 29: CALL_ALLOC_AND_ENTER_INIT(1) 1 5
23 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=29, operand0=0, operand1=0)
24 ADD_TO_TRACE: _SET_IP (0, target=29, operand0=0x1051c770a, operand1=0)
25 ADD_TO_TRACE: _CHECK_PEP_523 (1, target=29, operand0=0, operand1=0)
26 ADD_TO_TRACE: _CHECK_AND_ALLOCATE_OBJECT (1, target=29, operand0=0x20049, operand1=0)
27 ADD_TO_TRACE: _CREATE_INIT_FRAME (1, target=29, operand0=0, operand1=0)
Adding 0x1052f4350 func to op
28 ADD_TO_TRACE: _PUSH_FRAME (1, target=29, operand0=0x1052f4350, operand1=0)
29 ADD_TO_TRACE: _GUARD_IP__PUSH_FRAME (0, target=0, operand0=0x1051b6410, operand1=0)
Trace continuing
0x1051b6340 0: RESUME_CHECK(0) 0 0
30 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0)
31 ADD_TO_TRACE: _SET_IP (0, target=0, operand0=0x1051b6410, operand1=0)
32 ADD_TO_TRACE: _TIER2_RESUME_CHECK (0, target=0, operand0=0, operand1=0)
Trace continuing
0x1051b6340 1: LOAD_FAST_BORROW_LOAD_FAST_BORROW(16) 0 0
33 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0)
34 ADD_TO_TRACE: _SET_IP (0, target=1, operand0=0x1051b6412, operand1=0)
35 ADD_TO_TRACE: _LOAD_FAST_BORROW (1, target=1, operand0=0, operand1=0)
36 ADD_TO_TRACE: _LOAD_FAST_BORROW (0, target=1, operand0=0, operand1=0)
Trace continuing
0x1051b6340 2: STORE_ATTR_SLOT(0) 0 2
37 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=2, operand0=0, operand1=0)
38 ADD_TO_TRACE: _SET_IP (0, target=2, operand0=0x1051b6414, operand1=0)
39 ADD_TO_TRACE: _GUARD_TYPE_VERSION (0, target=2, operand0=0x20049, operand1=0)
40 ADD_TO_TRACE: _STORE_ATTR_SLOT (0, target=2, operand0=0x10, operand1=0)
41 ADD_TO_TRACE: _POP_TOP (0, target=2, operand0=0, operand1=0)
Trace continuing
0x1051b6340 7: LOAD_CONST(0) 0 0
42 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=7, operand0=0, operand1=0)
43 ADD_TO_TRACE: _SET_IP (0, target=7, operand0=0x1051b641e, operand1=0)
44 ADD_TO_TRACE: _LOAD_CONST (0, target=7, operand0=0, operand1=0)
Trace continuing
0x1051b6340 8: RETURN_VALUE(0) 1 1
45 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=8, operand0=0, operand1=0)
46 ADD_TO_TRACE: _SET_IP (0, target=8, operand0=0x1051b6420, operand1=0)
Adding 0x104dece01 code to op
47 ADD_TO_TRACE: _RETURN_VALUE (0, target=8, operand0=0x104dece01, operand1=0)
48 ADD_TO_TRACE: _GUARD_IP_RETURN_VALUE (0, target=0, operand0=0x104deced0, operand1=0)
Trace continuing
0x104dece00 0: EXIT_INIT_CHECK(0) 0 2
49 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0)
50 ADD_TO_TRACE: _SET_IP (0, target=0, operand0=0x104deced0, operand1=0)
51 ADD_TO_TRACE: _EXIT_INIT_CHECK (0, target=0, operand0=0, operand1=0)
Trace continuing
0x104dece00 1: RETURN_VALUE(0) 1 1
52 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0)
53 ADD_TO_TRACE: _SET_IP (0, target=1, operand0=0x104deced2, operand1=0)
Adding 0x10526a450 func to op
54 ADD_TO_TRACE: _RETURN_VALUE (0, target=1, operand0=0x10526a450, operand1=0)
55 ADD_TO_TRACE: _GUARD_IP_RETURN_VALUE (0, target=0, operand0=0x1051c7712, operand1=0)
Trace continuing
0x1051c7600 33: STORE_FAST(2) 0 3
56 ADD_TO_TRACE: _CHECK_VALIDITY (0, target=33, operand0=0, operand1=0)
57 ADD_TO_TRACE: _SET_IP (0, target=33, operand0=0x1051c7712, operand1=0)
58 ADD_TO_TRACE: _SWAP_FAST (2, target=33, operand0=0, operand1=0)
59 ADD_TO_TRACE: _POP_TOP (2, target=33, operand0=0, operand1=0)
60 ADD_TO_TRACE: _JUMP_TO_TOP (0, target=0, operand0=0, operand1=0)
Trace done
0 abs: _START_EXECUTOR (0, target=34, operand0=0x1051c7714, operand1=0) stack_level 2
1 abs: _MAKE_WARM (0, target=0, operand0=0, operand1=0) stack_level 2
2 abs: _CHECK_VALIDITY (0, target=34, operand0=0, operand1=0) stack_level 2
3 abs: _SET_IP (0, target=34, operand0=0x1051c7714, operand1=0) stack_level 2
4 abs: _CHECK_PERIODIC (0, target=34, operand0=0, operand1=0) stack_level 2
5 abs: _CHECK_VALIDITY (0, target=20, operand0=0, operand1=0) stack_level 2
6 abs: _SET_IP (0, target=20, operand0=0x1051c76f8, operand1=0) stack_level 2
7 abs: _ITER_CHECK_RANGE (14, target=20, operand0=0, operand1=0) stack_level 2
8 abs: _GUARD_NOT_EXHAUSTED_RANGE (14, target=37, operand0=0, operand1=0) stack_level 2
9 abs: _ITER_NEXT_RANGE (14, target=20, operand0=0, operand1=0) stack_level 3
10 abs: _CHECK_VALIDITY (0, target=22, operand0=0, operand1=0) stack_level 3
11 abs: _SET_IP (0, target=22, operand0=0x1051c76fc, operand1=0) stack_level 3
12 abs: _SWAP_FAST (1, target=22, operand0=0, operand1=0) stack_level 3
13 abs: _POP_TOP (1, target=22, operand0=0, operand1=0) stack_level 2
14 abs: _CHECK_VALIDITY (0, target=23, operand0=0, operand1=0) stack_level 2
15 abs: _SET_IP (0, target=23, operand0=0x1051c76fe, operand1=0) stack_level 2
16 abs: _NOP (3, target=23, operand0=0, operand1=0) stack_level 2
17 abs: _LOAD_GLOBAL_MODULE (3, target=23, operand0=0x2c, operand1=0x7) stack_level 3
18 abs: _PUSH_NULL_CONDITIONAL (3, target=23, operand0=0, operand1=0) stack_level 4
19 abs: _CHECK_VALIDITY (0, target=28, operand0=0, operand1=0) stack_level 4
20 abs: _SET_IP (0, target=28, operand0=0x1051c7708, operand1=0) stack_level 4
21 abs: _LOAD_SMALL_INT (1, target=28, operand0=0, operand1=0) stack_level 5
22 abs: _CHECK_VALIDITY (0, target=29, operand0=0, operand1=0) stack_level 5
23 abs: _SET_IP (0, target=29, operand0=0x1051c770a, operand1=0) stack_level 5
24 abs: _CHECK_PEP_523 (1, target=29, operand0=0, operand1=0) stack_level 5
25 abs: _CHECK_AND_ALLOCATE_OBJECT (1, target=29, operand0=0x20049, operand1=0) stack_level 5
26 abs: _CREATE_INIT_FRAME (1, target=29, operand0=0, operand1=0) func=0x1052f4350 code=0x1051b6340 27 abs: _PUSH_FRAME (1, target=29, operand0=0x1052f4350, operand1=0) stack_level 0
28 abs: _NOP (0, target=0, operand0=0, operand1=0) stack_level 0
29 abs: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0) stack_level 0
30 abs: _SET_IP (0, target=0, operand0=0x1051b6410, operand1=0) stack_level 0
31 abs: _TIER2_RESUME_CHECK (0, target=0, operand0=0, operand1=0) stack_level 0
32 abs: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0) stack_level 0
33 abs: _SET_IP (0, target=1, operand0=0x1051b6412, operand1=0) stack_level 0
34 abs: _LOAD_FAST_BORROW (1, target=1, operand0=0, operand1=0) stack_level 1
35 abs: _LOAD_FAST_BORROW (0, target=1, operand0=0, operand1=0) stack_level 2
36 abs: _CHECK_VALIDITY (0, target=2, operand0=0, operand1=0) stack_level 2
37 abs: _SET_IP (0, target=2, operand0=0x1051b6414, operand1=0) stack_level 2
38 abs: _GUARD_TYPE_VERSION (0, target=2, operand0=0x20049, operand1=0) stack_level 2
39 abs: _STORE_ATTR_SLOT (0, target=2, operand0=0x10, operand1=0) stack_level 1
40 abs: _POP_TOP (0, target=2, operand0=0, operand1=0) stack_level 0
41 abs: _CHECK_VALIDITY (0, target=7, operand0=0, operand1=0) stack_level 0
42 abs: _SET_IP (0, target=7, operand0=0x1051b641e, operand1=0) stack_level 0
43 abs: _LOAD_CONST (0, target=7, operand0=0, operand1=0) stack_level 1
44 abs: _CHECK_VALIDITY (0, target=8, operand0=0, operand1=0) stack_level 1
45 abs: _SET_IP (0, target=8, operand0=0x1051b6420, operand1=0) stack_level 1
46 abs: _RETURN_VALUE (0, target=8, operand0=0x104dece01, operand1=0x2) code=0x104dece00 47 abs: _NOP (0, target=0, operand0=0, operand1=0) 48 abs: _CHECK_VALIDITY (0, target=0, operand0=0, operand1=0) 49 abs: _SET_IP (0, target=0, operand0=0x104deced0, operand1=0) 50 abs: _EXIT_INIT_CHECK (0, target=0, operand0=0, operand1=0) 51 abs: _CHECK_VALIDITY (0, target=1, operand0=0, operand1=0) 52 abs: _SET_IP (0, target=1, operand0=0x104deced2, operand1=0) 53 abs: _RETURN_VALUE (0, target=1, operand0=0x10526a450, operand1=0x3) func=0x10526a450 code=0x1051c7600 stack_level 3
54 abs: _NOP (0, target=0, operand0=0, operand1=0) stack_level 3
55 abs: _CHECK_VALIDITY (0, target=33, operand0=0, operand1=0) stack_level 3
56 abs: _SET_IP (0, target=33, operand0=0x1051c7712, operand1=0) stack_level 3
57 abs: _SWAP_FAST (2, target=33, operand0=0, operand1=0) stack_level 3
58 abs: _POP_TOP (2, target=33, operand0=0, operand1=0) stack_level 2
59 abs: _JUMP_TO_TOP (0, target=0, operand0=0, operand1=0) stack_level 2
Optimized trace (length 64):
0 OPTIMIZED: _START_EXECUTOR_r00 (0, jump_target=46, operand0=0xaa50fb830, operand1=0)
1 OPTIMIZED: _MAKE_WARM_r00 (0, target=0, operand0=0, operand1=0)
2 OPTIMIZED: _SET_IP_r00 (0, target=34, operand0=0x1051c7714, operand1=0)
3 OPTIMIZED: _CHECK_PERIODIC_r00 (0, jump_target=0, operand0=0, operand1=0, error_target=47)
4 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=48, operand0=0, operand1=0)
5 OPTIMIZED: _ITER_CHECK_RANGE_r02 (14, jump_target=49, operand0=0, operand1=0)
6 OPTIMIZED: _GUARD_NOT_EXHAUSTED_RANGE_r22 (14, jump_target=50, operand0=0, operand1=0)
7 OPTIMIZED: _ITER_NEXT_RANGE_r23 (14, jump_target=0, operand0=0, operand1=0, error_target=51)
8 OPTIMIZED: _SET_IP_r33 (0, target=22, operand0=0x1051c76fc, operand1=0)
9 OPTIMIZED: _SWAP_FAST_1_r33 (1, target=22, operand0=0, operand1=0)
10 OPTIMIZED: _SPILL_OR_RELOAD_r31 (0, target=0, operand0=0, operand1=0)
11 OPTIMIZED: _POP_TOP_r10 (1, target=22, operand0=0, operand1=0)
12 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=52, operand0=0, operand1=0)
13 OPTIMIZED: _GUARD_GLOBALS_VERSION_r00 (0, jump_target=52, operand0=0x2c, operand1=0)
14 OPTIMIZED: _LOAD_CONST_INLINE_r01 (3, target=23, operand0=0xaa510b730, operand1=0x7)
15 OPTIMIZED: _PUSH_NULL_r12 (0, target=23, operand0=0, operand1=0)
16 OPTIMIZED: _LOAD_CONST_INLINE_BORROW_r23 (0, target=28, operand0=0x104e381d0, operand1=0)
17 OPTIMIZED: _SET_IP_r33 (0, target=29, operand0=0x1051c770a, operand1=0)
18 OPTIMIZED: _SPILL_OR_RELOAD_r30 (0, target=0, operand0=0, operand1=0)
19 OPTIMIZED: _CHECK_AND_ALLOCATE_OBJECT_r00 (1, jump_target=53, operand0=0x20049, operand1=0, error_target=54)
20 OPTIMIZED: _CREATE_INIT_FRAME_r01 (1, jump_target=0, operand0=0, operand1=0, error_target=54)
21 OPTIMIZED: _PUSH_FRAME_r10 (1, target=29, operand0=0x1052f4350, operand1=0)
22 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=55, operand0=0, operand1=0)
23 OPTIMIZED: _TIER2_RESUME_CHECK_r00 (0, jump_target=56, operand0=0, operand1=0)
24 OPTIMIZED: _LOAD_FAST_BORROW_1_r01 (1, target=1, operand0=0, operand1=0)
25 OPTIMIZED: _LOAD_FAST_BORROW_0_r12 (0, target=1, operand0=0, operand1=0)
26 OPTIMIZED: _SET_IP_r22 (0, target=2, operand0=0x1051b6414, operand1=0)
27 OPTIMIZED: _GUARD_TYPE_VERSION_r22 (0, jump_target=57, operand0=0x20049, operand1=0)
28 OPTIMIZED: _STORE_ATTR_SLOT_r21 (0, jump_target=58, operand0=0x10, operand1=0)
29 OPTIMIZED: _POP_TOP_NOP_r10 (0, target=2, operand0=0, operand1=0)
30 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=59, operand0=0, operand1=0)
31 OPTIMIZED: _LOAD_CONST_INLINE_BORROW_r01 (0, target=7, operand0=0x104e06b90, operand1=0)
32 OPTIMIZED: _SET_IP_r11 (0, target=8, operand0=0x1051b6420, operand1=0)
33 OPTIMIZED: _RETURN_VALUE_r11 (0, target=8, operand0=0x104dece01, operand1=0x2)
34 OPTIMIZED: _CHECK_VALIDITY_r11 (0, jump_target=60, operand0=0, operand1=0)
35 OPTIMIZED: _SET_IP_r11 (0, target=0, operand0=0x104deced0, operand1=0)
36 OPTIMIZED: _EXIT_INIT_CHECK_r10 (0, jump_target=0, operand0=0, operand1=0, error_target=61)
37 OPTIMIZED: _CHECK_VALIDITY_r00 (0, jump_target=62, operand0=0, operand1=0)
38 OPTIMIZED: _SET_IP_r00 (0, target=1, operand0=0x104deced2, operand1=0)
39 OPTIMIZED: _SPILL_OR_RELOAD_r01 (0, target=0, operand0=0, operand1=0)
40 OPTIMIZED: _RETURN_VALUE_r11 (0, target=1, operand0=0x10526a450, operand1=0x3)
41 OPTIMIZED: _CHECK_VALIDITY_r11 (0, jump_target=63, operand0=0, operand1=0)
42 OPTIMIZED: _SET_IP_r11 (0, target=33, operand0=0x1051c7712, operand1=0)
43 OPTIMIZED: _SWAP_FAST_2_r11 (2, target=33, operand0=0, operand1=0)
44 OPTIMIZED: _POP_TOP_r10 (2, target=33, operand0=0, operand1=0)
45 OPTIMIZED: _JUMP_TO_TOP_r00 (0, jump_target=1, operand0=0, operand1=0)
46 OPTIMIZED: _DEOPT_r00 (0, target=34, operand0=0, operand1=0)
47 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0x22, operand1=0)
48 OPTIMIZED: _DEOPT_r00 (0, target=20, operand0=0, operand1=0)
49 OPTIMIZED: _EXIT_TRACE_r00 (0, target=20, operand0=0xaa50fb8b0, operand1=0)
50 OPTIMIZED: _EXIT_TRACE_r20 (0, target=37, operand0=0xaa50fb8c0, operand1=0x1)
51 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0x14, operand1=0)
52 OPTIMIZED: _DEOPT_r00 (0, target=23, operand0=0, operand1=0)
53 OPTIMIZED: _DEOPT_r00 (0, target=29, operand0=0, operand1=0)
54 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0x1d, operand1=0)
55 OPTIMIZED: _DEOPT_r00 (0, target=0, operand0=0, operand1=0)
56 OPTIMIZED: _HANDLE_PENDING_AND_DEOPT_r00 (0, target=0, operand0=0, operand1=0)
57 OPTIMIZED: _EXIT_TRACE_r20 (0, target=2, operand0=0xaa50fb8d0, operand1=0)
58 OPTIMIZED: _DEOPT_r20 (0, target=2, operand0=0, operand1=0)
59 OPTIMIZED: _DEOPT_r00 (0, target=7, operand0=0, operand1=0)
60 OPTIMIZED: _DEOPT_r10 (0, target=0, operand0=0, operand1=0)
61 OPTIMIZED: _ERROR_POP_N_r00 (0, target=0, operand0=0, operand1=0)
62 OPTIMIZED: _DEOPT_r00 (0, target=1, operand0=0, operand1=0)
63 OPTIMIZED: _DEOPT_r10 (0, target=33, operand0=0, operand1=0)
@cocolato I'm surprised, you're right. Really sorry, my mistake. I have another issue to work on. Can you please work on making the JIT optimization state per-thread?
https://github.com/python/cpython/issues/143421#issuecomment-3710690408
The main one is JitOptContext is stack allocated https://github.com/python/cpython/blob/main/Python/optimizer_analysis.c#L342
We need it to be pushed to _PyThreadStateImpl https://github.com/python/cpython/blob/main/Include/internal/pycore_tstate.h#L149
So that we don't stack overflow on the JIT. This time I've verified that the problem indeed exists. Do you want to take this one up?
Ok, I will take some time to understand the issue and work on it!
Please don't work on the parent issue (Make the JIT optimizer buffer add to a new buffer, not in-place), instead it's the sub-issue, thanks!
@Fidget-Spinner Hi, are there some other tasks we can do for this issue?
@cocolato We need to first implement a new symbolic type for objects. See optimizer_symbols.c and the things that are done there. I can let you take that up if you want.
I will learn and try this part :)
@cocolato can I assign you this to do before that though? We first need the buffer to append instead of overwrite in place if we want to do this optimization properly.
https://github.com/python/cpython/issues/143421