Tier2 peephole optimization: remove extraneous _CHECK_STACK_SPACE ops
Implement a new peephole optimization for the tier2 optimizer that removes _CHECK_STACK_SPACE if we see that this check is already present
cc @Fidget-Spinner
This might need some labels. I also looked around for a parent ticket, but didn't find anything particularly obvious
I will link to https://github.com/python/cpython/issues/115419
I don't know anything about the plan here, so this may not be news. :-) Couldn't you check for the needed stack space once at the beginning of the executor? And if there's a loop, lift it out of there. (And if it fits within the stack space of the code object already, drop it altogether.)
Makes sense. Let's go with consolidating them first then another follow-up PR for lifting. Lifting stuff out of loops requires _JUMP_ABSOLUTE and I can just point to the code from my old PR to copy. (Though the heuristics need someone else to work on it).
Is anyone working on this? Otherwise I'd like to give it a go.
@gvanrossum I think @lazorchakp might be working on this, @lazorchakp do you mind if Guido takes over? If you have something with an implementation up but no tests that's okay, you can just submit a draft PR and Guido can probably take over if you don't mind.
I'd also be happy to mentor @lazorchakp !
Sorry everyone - I've been busier than anticipated and haven't made concrete progress on this yet. If it's alright though, I'd still like to try it, and I should be able to get a draft up tonight if that works. @Fidget-Spinner has given me some guidance over email, but I'd gladly accept as much mentoring as you both are willing to provide 🙂
That being said, @gvanrossum if you'd rather go for it yourself, that's quite alright! Just let me know.
No, I'd love to have you do it!
My guess is that the critical trick should be to change the uop that checks for stack space to take a number indicating the amount of stack space needed, rather than digging that value out of the code object (which must be retrieved from the function object, which must be dug out of the stack at position -2-oparg, IIRC).
Maybe this number can be put in its operand field by the initial trace projection pass (which needs the code object anyway). Then a later dedicated pass can combine stack checks as long as they are within the same "scope" (i.e., _PUSH_FRAME/_POP_FRAME pair).
That sounds great. It makes sense to me that we'd now need _CHECK_STACK_SPACE to know up-front how much space it's looking for. I'll see if I can take your suggestion and modify the uop so that this information is available for the optimization pass.
I'm at the point where I have the operand for each _CHECK_STACK_SPACE being correctly assigned to its corresponding function's framesize (pushed a rough branch). Before I implement the actual optimization though, I want to make sure I'm reasoning about this correctly - I'm making some heavy assumptions about the underlying machinery. Are the following cases correct? (other ops removed for simplicity):
CASE 1 - sequential function calls:
_CHECK_STACK_SPACE A
_PUSH_FRAME
_POP_FRAME
_CHECK_STACK_SPACE B
_PUSH_FRAME
_POP_FRAME
proposed optimization:
_CHECK_STACK_SPACE max(A, B)
_PUSH_FRAME
_POP_FRAME
_PUSH_FRAME
_POP_FRAME
CASE 2 - nested function calls:
_CHECK_STACK_SPACE A
_PUSH_FRAME
_CHECK_STACK_SPACE B
_PUSH_FRAME
_POP_FRAME
_POP_FRAME
proposed optimization:
_CHECK_STACK_SPACE A + B
_PUSH_FRAME
_PUSH_FRAME
_POP_FRAME
_POP_FRAME
If these cases are correct, I should be able to combine everything into a single _CHECK_STACK_SPACE uop per trace, as @gvanrossum mentioned above.
Yes, both cases look right.
@gvanrossum / @Fidget-Spinner
I have this mostly working, but I'm stuck on one critical piece: actually using the new _CHECK_STACK_SPACE operand (the amount of space to check) at execution time. I think I need to update the definition of _CHECK_STACK_SPACE in bytecodes.c. My fear is that I'll only have the operand in tier2 after trace projection, so tier1 and tier2 behavior would have to differ.
Does anyone have ideas about how I might be able to modify _CHECK_STACK_SPACE so that for tier2 (executor_cases.c.h...?) it will use the operand populated during trace projection but for tier1 (generated_cases.c.h...?) it will check the value on the stack? If the behavior must be identical, do I need to modify the bytecode cache in the specializer too? (It's entirely possible that these questions don't make sense or that I have some fundamental misunderstanding)
@lazorchakp if the instruction will not fit the format in tier 1, you can add a new uop _CHECK_STACK_SPACE_COMBINED or something. Then manually swap out for that uop in the optimizer analysis.
Tier 1 should be unaffected. IIRC there are #ifdefs you can check for.
I don't know what the "eyes" reaction means. Do you need more info? If so please ask another question. :-)
Ah sorry, didn't want to generate too much noise - I'll be more clear in the future. I just restarted work on this for the evening and these comments are enough to unblock me for now. I won't hesitate to ask another question if I get stuck somewhere!
I'm getting ready to open the PR here and am wrapping up some final tests. One of my new tests fails on main though; is opening a new issue appropriate? I haven't had time to investigate yet.
This is it, if anyone's interested (under test_opt.TestUopsOptimziation)
def test_many_nested(self)
# overflow the trace_stack
def dummy_a(x):
return x
def dummy_b(x):
return dummy_a(x)
def dummy_c(x):
return dummy_b(x)
def dummy_d(x):
return dummy_c(x)
def dummy_e(x):
return dummy_d(x)
def dummy_f(x):
return dummy_e(x)
def dummy_g(x):
return dummy_f(x)
def dummy_h(x):
return dummy_g(x)
def testfunc(n):
a = 0
for _ in range(n):
a += dummy_h(n)
return a
self._run_with_optimizer(testfunc, 32)
error:
Assertion failed: ((this_instr + 2)->opcode == _PUSH_FRAME), function optimize_uops, file optimizer_cases.c.h, line 1600.
Fatal Python error: Aborted
Current thread 0x000000020140a100 (most recent call first):
File "..../cpython/Lib/test/test_capi/test_opt.py", line 978 in testfunc
File "..../cpython/Lib/test/test_capi/test_opt.py", line 588 in _run_with_optimizer
...
Seems like a bug. Please open an issue with the repro. I will fix it. Thanks!
@Fidget-Spinner I opened #117180. Thanks, and lmk if you need more details.
Still have to iterate with the pipeline a bit in my draft PR. I'll move it to Open and send an update here when it's officially ready for review.
Just opened #117242 for review
This is done -- thanks again!