cpython Better uop coverage in the JIT optimizer

Out of 263 total uops, 155 of these are ignored by the tier two optimizer. These represent over half of all uops by dynamic execution count.

This issue will serve as a checklist for auditing these missing uops, and adding them where they make sense. At first glance, there's quite a bit of potential here... especially around ability to narrow known output types (like _CONTAINS_OP_SET), and the ability to narrow and remove guards on input types (like _BINARY_OP_SUBSCR_LIST_INT). As I'm going through, I'll ~cross out~ anything that doesn't seem like it makes sense to add.

First, here are the 53 missing uops that each represent at least 0.1% of all uops executed:

[ ] ~_SET_IP~ (12.1%)
[ ] ~_CHECK_VALIDITY~ (10.1%)
[ ] ~_CHECK_VALIDITY_AND_SET_IP~ (6.5%)
[ ] ~_CHECK_PERIODIC~ (3.1%)
[ ] ~_MAKE_WARM~ (2.8%)
[ ] ~_START_EXECUTOR~ (1.7%)
[x] _GUARD_NOS_INT (1.5%)
[ ] ~_BINARY_OP_SUBSCR_LIST_INT~ (1.0%)
[ ] _CHECK_FUNCTION (1.0%)
[ ] ~_CHECK_MANAGED_OBJECT_HAS_VALUES~ (0.7%)
[ ] _ITER_CHECK_LIST (0.7%)
[x] _CONTAINS_OP_SET (0.6%)
[ ] _FOR_ITER_TIER_TWO (0.6%)
[ ] ~_GUARD_NOT_EXHAUSTED_LIST~ (0.6%)
[ ] ~_ITER_NEXT_LIST_TIER_TWO~ (0.6%)
[ ] ~_SAVE_RETURN_OFFSET~ (0.6%)
[x] _CALL_LEN (0.5%)
[x] _CALL_LIST_APPEND (0.5%)
[ ] _POP_TOP (0.5%)
[ ] _RESUME_CHECK (0.5%)
[x] _BINARY_OP_SUBSCR_STR_INT (0.4%)
[ ] ~_GUARD_DORV_VALUES_INST_ATTR_FROM_DICT~ (0.4%)
[ ] ~_GUARD_KEYS_VERSION~ (0.4%)
[ ] ~_BINARY_OP_SUBSCR_DICT~ (0.3%)
[ ] _CALL_BUILTIN_FAST (0.3%)
[ ] _CHECK_STACK_SPACE_OPERAND (0.3%)
[ ] _GET_ITER (0.3%)
[ ] ~_STORE_SUBSCR~ (0.3%)
[ ] ~_GUARD_NOT_EXHAUSTED_RANGE~ (0.2%)
[x] _BINARY_SLICE (0.2%)
[x] _BUILD_LIST (0.2%)
[ ] _CALL_BUILTIN_O (0.2%)
[ ] _CALL_NON_PY_GENERAL (0.2%)
[ ] _CHECK_IS_NOT_PY_CALLABLE (0.2%)
[x] _GUARD_NOS_FLOAT (0.2%)
[ ] _ITER_CHECK_RANGE (0.2%)
[ ] _ITER_CHECK_TUPLE (0.2%)
[ ] _LOAD_DEREF (0.2%)
[ ] ~_STORE_SUBSCR_LIST_INT~ (0.2%)
[ ] _BINARY_OP_EXTEND (0.1%)
[x] _CALL_ISINSTANCE (0.1%)
[ ] _CALL_METHOD_DESCRIPTOR_FAST (0.1%)
[ ] _CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS (0.1%)
[ ] _CALL_METHOD_DESCRIPTOR_NOARGS (0.1%)
[x] _CALL_TYPE_1 (0.1%)
[x] _CHECK_ATTR_CLASS (0.1%)
[x] _CONTAINS_OP_DICT (0.1%)
[x] _GUARD_BINARY_OP_EXTEND (0.1%)
[ ] ~_GUARD_NOT_EXHAUSTED_TUPLE~ (0.1%)
[ ] ~_ITER_NEXT_TUPLE~ (0.1%)
[ ] ~_LIST_APPEND~ (0.1%)
[ ] ~_STORE_ATTR_SLOT~ (0.1%)
[ ] ~_STORE_SUBSCR_DICT~ (0.1%)

And here are the 102 missing uops that are less than 0.1%. These are less important, but still may net us some wins on individual benchmarks:

[ ] _BINARY_OP_SUBSCR_CHECK_FUNC
[x] _BINARY_OP_SUBSCR_TUPLE_INT
[x] _BUILD_MAP
[x] _BUILD_SET
[x] _BUILD_SLICE
[x] _BUILD_STRING
[ ] _CALL_BUILTIN_CLASS
[ ] _CALL_BUILTIN_FAST_WITH_KEYWORDS
[ ] _CALL_INTRINSIC_1
[ ] _CALL_INTRINSIC_2
[ ] _CALL_KW_NON_PY
[ ] _CALL_METHOD_DESCRIPTOR_O
[x] _CALL_STR_1
[x] _CALL_TUPLE_1
[ ] _CHECK_ATTR_METHOD_LAZY_DICT
[ ] _CHECK_EG_MATCH
[ ] _CHECK_EXC_MATCH
[ ] _CHECK_FUNCTION_VERSION_INLINE
[ ] _CHECK_FUNCTION_VERSION_KW
[ ] _CHECK_IS_NOT_PY_CALLABLE_KW
[ ] _CHECK_METHOD_VERSION
[ ] _CHECK_METHOD_VERSION_KW
[ ] _CHECK_PERIODIC_IF_NOT_YIELD_FROM
[ ] _CONVERT_VALUE
[ ] _COPY_FREE_VARS
[ ] _DELETE_ATTR
[ ] _DELETE_DEREF
[ ] _DELETE_FAST
[ ] _DELETE_GLOBAL
[ ] _DELETE_NAME
[ ] _DELETE_SUBSCR
[ ] _DEOPT
[ ] _DICT_MERGE
[ ] _DICT_UPDATE
[ ] _END_FOR
[ ] _END_SEND
[ ] _ERROR_POP_N
[ ] _EXIT_INIT_CHECK
[ ] _EXPAND_METHOD
[ ] _EXPAND_METHOD_KW
[ ] _FATAL_ERROR
[ ] _FORMAT_SIMPLE
[ ] _FORMAT_WITH_SPEC
[ ] _GET_AITER
[ ] _GET_ANEXT
[ ] _GET_AWAITABLE
[x] _GET_LEN
[ ] _GET_YIELD_FROM_ITER
[ ] _GUARD_DORV_NO_DICT
[ ] _GUARD_GLOBALS_VERSION
[x] _GUARD_TOS_FLOAT
[x] _GUARD_TOS_INT
[ ] _GUARD_TYPE_VERSION_AND_LOCK
[ ] _IMPORT_FROM
[ ] _IMPORT_NAME
[ ] _IS_NONE
[ ] _LIST_EXTEND
[x] _LOAD_ATTR_NONDESCRIPTOR_NO_DICT
[x] _LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES
[ ] _LOAD_BUILD_CLASS
[ ] _LOAD_COMMON_CONSTANT
[ ] _LOAD_FAST_LOAD_FAST
[ ] _LOAD_FROM_DICT_OR_DEREF
[ ] _LOAD_GLOBAL
[ ] _LOAD_GLOBAL_BUILTINS
[ ] _LOAD_GLOBAL_MODULE
[ ] _LOAD_LOCALS
[ ] _LOAD_NAME
[ ] _LOAD_SUPER_ATTR_ATTR
[ ] _LOAD_SUPER_ATTR_METHOD
[ ] _MAKE_CALLARGS_A_TUPLE
[ ] _MAKE_CELL
[ ] _MAKE_FUNCTION
[ ] _MAP_ADD
[ ] _MATCH_CLASS
[ ] _MATCH_KEYS
[ ] _MATCH_MAPPING
[ ] _MATCH_SEQUENCE
[ ] _MAYBE_EXPAND_METHOD_KW
[ ] ~_NOP~
[ ] _POP_EXCEPT
[ ] _POP_TWO_LOAD_CONST_INLINE_BORROW
[ ] _PUSH_EXC_INFO
[ ] _PUSH_NULL_CONDITIONAL
[ ] _SETUP_ANNOTATIONS
[ ] _SET_ADD
[ ] _SET_FUNCTION_ATTRIBUTE
[ ] _SET_UPDATE
[ ] _STORE_ATTR
[ ] _STORE_ATTR_INSTANCE_VALUE
[ ] _STORE_ATTR_WITH_HINT
[ ] _STORE_DEREF
[ ] _STORE_FAST_LOAD_FAST
[ ] _STORE_FAST_STORE_FAST
[ ] _STORE_GLOBAL
[ ] _STORE_NAME
[ ] ~_STORE_SLICE~
[ ] _TIER2_RESUME_CHECK
[x] _UNARY_INVERT
[ ] _UNARY_NEGATIVE
[ ] ~_UNPACK_SEQUENCE_LIST~
[ ] _WITH_EXCEPT_START

Linked PRs

gh-131800
gh-131816
gh-132057
gh-132153
gh-132269
gh-132278
gh-132289
gh-132419
gh-132434
gh-132564
gh-132849
gh-132851
gh-132940
gh-133003
gh-133172
gh-133180
gh-133339
gh-133345
gh-133527
gh-133609
gh-134240
gh-134268
gh-134369
gh-134373
gh-134403
gh-134406
gh-134543
gh-134554
gh-134671
gh-134803
gh-135022
gh-135194
gh-135222
gh-135223
gh-135260
gh-137607
gh-143340

Mar 27 '25 16:03 brandtbucher

@diegorusso is going to add _CALL_LEN.

Apr 01 '25 17:04 brandtbucher

@fluhus is going to add _BINARY_SLICE.

Apr 03 '25 15:04 brandtbucher

@Klaus117 is going to improve _TO_BOOL_INT.

Apr 03 '25 15:04 brandtbucher

@Zheaoli is going to add _CONTAINS_OP_DICT.

Apr 07 '25 17:04 brandtbucher

I think I can work on _BINARY_OP_SUBSCR_LIST_INT and _BINARY_OP_SUBSCR_DICT

Apr 08 '25 17:04 Zheaoli

I think I can work on _BINARY_OP_SUBSCR_LIST_INT and _BINARY_OP_SUBSCR_DICT

Sorry, I already have a branch to do the guards for these (and a couple others) that I was going to up in a minute! I'll tag you for review though.

Apr 08 '25 17:04 brandtbucher

@tomasr8 is going to add _CALL_STR_1, _CALL_TUPLE_1, and _CALL_TYPE_1.

@Zheaoli, want to take _BUILD_LIST, _BUILD_MAP, _BUILD_SET, _BUILD_SLICE, and _BUILD_STRING? For all but _BUILD_SLICE and _BUILD_STRING, I think we can only set the type of the output (sym_new_type). For _BUILD_SLICE and _BUILD_STRING, we may be able to have a constant output if the items are constant (sym_is_const/sym_get_const/sym_new_const). Maybe one PR for the first three, and separate PRs for _BUILD_SLICE and _BUILD_STRING?

Apr 09 '25 22:04 brandtbucher

want to take _BUILD_LIST, _BUILD_MAP, _BUILD_SET, _BUILD_SLICE, and _BUILD_STRING? For all but _BUILD_SLICE and _BUILD_STRING, I think we can only set the type of the output (sym_new_type). For _BUILD_SLICE and _BUILD_STRING, we may be able to have a constant output if the items are constant

100% want to（lol

I'm on the way

Apr 10 '25 09:04 Zheaoli

I want to join , I am the starter of python, I've only ever written test scripts in python at work, and have seen its charms. I should start reading from that part of my knowledge, if I may.

Apr 10 '25 12:04 Funnyyanne

How can I join this?

Apr 10 '25 12:04 splasky

Not sure I can do it well, but I would like to try my best :)

Apr 10 '25 14:04 runzh-crypto

Sorry, I think at this point we already have enough people working on this issue (it's also a pretty tough one if you've never contributed to CPython before). Thanks for the interest in helping out, though!

Apr 10 '25 14:04 brandtbucher

@Zheaoli, want to take _BUILD_LIST, _BUILD_MAP, _BUILD_SET, _BUILD_SLICE, and _BUILD_STRING? For all but _BUILD_SLICE and _BUILD_STRING, I think we can only set the type of the output (sym_new_type). For _BUILD_SLICE and _BUILD_STRING, we may be able to have a constant output if the items are constant (sym_is_const/sym_get_const/sym_new_const). Maybe one PR for the first three, and separate PRs for _BUILD_SLICE and _BUILD_STRING?

Sorry for delay in days, I might have some issues here. I have noticed that we have a similar uop code before, https://github.com/python/cpython/blob/main/Python/optimizer_bytecodes.c#L918-L920. It's seems create in https://github.com/python/cpython/pull/128940/files

I'm not sure we need to create sym_new_list like sym_new_tuple or we just need sym_new_type(ctx, &PyList_Type);

Apr 11 '25 17:04 Zheaoli

I'm not sure we need to create sym_new_list like sym_new_tuple or we just need sym_new_type(ctx, &PyList_Type);

For these, we just want to set the type. The symbolic tuple representation is a special case, since tuples are immutable. Tracking the values in a list is much harder, since the list can change in many different ways. For example:

t = (a, b)
l = [a, b]
foo(t, l)
x = t[0]  # The JIT can prove that x is a.
y = l[0]  # l may have been mutated by foo, so we can't prove anything.

The symbolic tuple representation allows us to track symbolic values inside of a tuple, even if they aren't constant. For instance, if we later guard that x is an int, then the JIT has also proven that a is an int as well (and vice-versa!). We may choose to (very carefully!) track the contents of mutable containers or object attributes in the future, but in general this is much harder to do correctly, and pays off in fewer cases.

Apr 11 '25 19:04 brandtbucher

For these, we just want to set the type. The symbolic tuple representation is a special case, since tuples are immutable. Tracking the values in a list is much harder, since the list can change in many different ways. For example:

I see, for now, we just need to replace sym_new_not_null to sym_new_type, it can get some performance improvement(maybe)

----Following discussion is just personal thought---

The symbolic tuple representation allows us to track symbolic values inside of a tuple, even if they aren't constant. For instance, if we later guard that x is an int, then the JIT has also proven that a is an int as well (and vice-versa!). We may choose to (very carefully!) track the contents of mutable containers or object attributes in the future, but in general this is much harder to do correctly, and pays off in fewer cases.

I got your point. So, Mark setup the sym_new_tuple here because it's immutable. In other words, the data built from _BUILD_TUPLE is predictable. So we can setup extra optimization path for tuple. But for mutable type like set, map, list , the optimization path is dangerous and not useful

I'm not sure my thought is right, correct me if I'm wrong plz.

Apr 11 '25 19:04 Zheaoli

Yep, your understanding is correct!

Apr 11 '25 20:04 brandtbucher

@Zheaoli is going to add _GET_LEN.

May 06 '25 20:05 brandtbucher

@tomasr8 is going to add _CALL_ISINSTANCE.

May 06 '25 20:05 brandtbucher

@diegorusso is going to add _CALL_LIST_APPEND.

May 06 '25 20:05 brandtbucher

@brandtbucher I'm trying to cover more UOP in optimizer. But I'm not sure about the priority of the rest of the UOP. Would you mind giving me some suggestions?

May 20 '25 22:05 Zheaoli

Sure! I think we can turn _CHECK_METHOD_VERSION into _CHECK_FUNCTION_VERSION_INLINE. Check out the optimizer case for _CHECK_FUNCTION_VERSION to see how it's done. This would work the same way, except you would check for &PyMethod_Type, and use the function from ((PyMethodObject *)callable)->im_func as the operand1.

Let me know if you have any questions.

May 21 '25 21:05 brandtbucher

Sure! I think we can turn _CHECK_METHOD_VERSION into _CHECK_FUNCTION_VERSION_INLINE. Check out the optimizer case for _CHECK_FUNCTION_VERSION to see how it's done. This would work the same way, except you would check for &PyMethod_Type, and use the function from ((PyMethodObject *)callable)->im_func as the operand1.

Let me know if you have any questions.

Thanks for the suggestion! I might need a little bit time. I think I can draft a PR this week

May 22 '25 03:05 Zheaoli

Reserving _ITER_NEXT_TUPLE for @noamcohen97.

May 27 '25 06:05 Fidget-Spinner

@brandtbucher I would like to try some _LOAD_ prefixed uop. My current idea is that some of the uop will be more power by set some information like const?

Jun 23 '25 18:06 Zheaoli

@Zheaoli do you mind if I redirect you do _CALL_STR_1 here? https://github.com/python/cpython/issues/134584

Jun 23 '25 18:06 Fidget-Spinner

do you mind if I redirect you do _CALL_STR_1 here? #134584

100% sure!

Love you guys.jpg ♥️

Jun 23 '25 18:06 Zheaoli

How can I join this?

Sep 20 '25 06:09 fatelei

The follow uops still have no entry in optimizer_bytecode.c:

_CHECK_PERIODIC _CHECK_PERIODIC_IF_NOT_YIELD_FROM _RESUME_CHECK _POP_TWO _END_FOR _POP_ITER _END_SEND _GUARD_TOS_SLICE _GUARD_NOS_OVERFLOWED _GUARD_TOS_OVERFLOWED _GUARD_BINARY_OP_EXTEND _BINARY_OP_EXTEND _STORE_SLICE _BINARY_OP_SUBSCR_LIST_SLICE _BINARY_OP_SUBSCR_DICT _BINARY_OP_SUBSCR_CHECK_FUNC _LIST_APPEND _SET_ADD _STORE_SUBSCR _DELETE_SUBSCR _CALL_INTRINSIC_1 _CALL_INTRINSIC_2 _GET_AITER _GET_ANEXT _GET_AWAITABLE _POP_EXCEPT _LOAD_COMMON_CONSTANT _LOAD_BUILD_CLASS _STORE_NAME _DELETE_NAME _UNPACK_SEQUENCE_LIST _STORE_ATTR _DELETE_ATTR _STORE_GLOBAL _DELETE_GLOBAL _LOAD_LOCALS _LOAD_NAME _LOAD_GLOBAL _DELETE_FAST _MAKE_CELL _DELETE_DEREF _LOAD_FROM_DICT_OR_DEREF _LOAD_DEREF _STORE_DEREF _COPY_FREE_VARS _BUILD_INTERPOLATION _BUILD_TEMPLATE _LIST_EXTEND _SET_UPDATE _SETUP_ANNOTATIONS _DICT_UPDATE _DICT_MERGE _MAP_ADD _LOAD_SUPER_ATTR_ATTR _LOAD_SUPER_ATTR_METHOD _GUARD_TYPE_VERSION_AND_LOCK _CHECK_MANAGED_OBJECT_HAS_VALUES _GUARD_DORV_NO_DICT _CHECK_EG_MATCH _CHECK_EXC_MATCH _IMPORT_NAME _IMPORT_FROM _IS_NONE _MATCH_CLASS _MATCH_MAPPING _MATCH_SEQUENCE _MATCH_KEYS _GET_YIELD_FROM_ITER _FOR_ITER_TIER_TWO _ITER_CHECK_LIST _GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST_TIER_TWO _GUARD_NOT_EXHAUSTED_TUPLE _ITER_NEXT_TUPLE _ITER_CHECK_RANGE _GUARD_NOT_EXHAUSTED_RANGE _WITH_EXCEPT_START _PUSH_EXC_INFO _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT _GUARD_KEYS_VERSION _CHECK_ATTR_METHOD_LAZY_DICT _CHECK_FUNCTION_VERSION_INLINE _EXPAND_METHOD _CHECK_IS_NOT_PY_CALLABLE _CALL_NON_PY_GENERAL _CHECK_RECURSION_REMAINING _EXIT_INIT_CHECK _CALL_BUILTIN_CLASS _CALL_BUILTIN_O _CALL_BUILTIN_FAST _CALL_BUILTIN_FAST_WITH_KEYWORDS _CALL_METHOD_DESCRIPTOR_O _CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS _CALL_METHOD_DESCRIPTOR_NOARGS _CALL_METHOD_DESCRIPTOR_FAST _MAYBE_EXPAND_METHOD_KW _CHECK_FUNCTION_VERSION_KW _CHECK_METHOD_VERSION_KW _EXPAND_METHOD_KW _CHECK_IS_NOT_PY_CALLABLE_KW _CALL_KW_NON_PY _MAKE_CALLARGS_A_TUPLE _MAKE_FUNCTION _SET_FUNCTION_ATTRIBUTE _CONVERT_VALUE _FORMAT_SIMPLE _FORMAT_WITH_SPEC _SET_IP _SAVE_RETURN_OFFSET _CHECK_VALIDITY _POP_CALL _POP_CALL_ONE _POP_CALL_TWO _POP_TWO_LOAD_CONST_INLINE_BORROW _LOAD_CONST_UNDER_INLINE _LOAD_CONST_UNDER_INLINE_BORROW _START_EXECUTOR _MAKE_WARM _FATAL_ERROR _HANDLE_PENDING_AND_DEOPT _ERROR_POP_N _TIER2_RESUME_CHECK _GUARD_IP__PUSH_FRAME _GUARD_IP_YIELD_VALUE _GUARD_IP_RETURN_VALUE _GUARD_IP_RETURN_GENERATOR

Dec 19 '25 13:12 markshannon

Probably the most critical ones are those that deal with frames. E.g. _FOR_ITER_GEN_FRAME and _SEND_GEN_FRAME both just terminate the optimizer. Those are blocking us from optimizing generators.

Dec 20 '25 23:12 Fidget-Spinner