gh-119258: Eliminate Type Guards in Tier 2 Optimizer
This PR eliminates some unnecessary calls to _GUARD_TYPE_VERSION in the tier 2 interpreter, closing #119258.
It does it by symbolically tracking the version of the type of each object and eliminating the type guard if the checked version is the same as the tracked one.
It also has to verify that there were no operations that might have escaped since we recorded the type version. So we store the last instruction offset that could lead to an escape (and possibly change the type version), and also store at which instruction offset we stored the version. We can compare these when optimizing, to make sure to only remove the bytecode if we haven't escaped after recording the version.
Work on this PR was done at the PyCon sprints with @dpdani and with help from @Fidget-Spinner and @markshannon
Development
In order to try out this commit, you have to configure it with the experimental jit:
./configure --with-pydebug --enable-experimental-jit=interpreter
Then whenever you change the cases you have to run make regen-cases and then make -j.
Finally, to just run one test case, you can use the --match:
./python.exe -m test -v test_capi.test_opt --match test_guard_type_version_removed
./python.exe -m test -v test_capi.test_opt --match test_guard_type_version_not_removed
- Issue: gh-119258
@brandtbucher can you run the benchmarks on this PR?
Just started the job (including stats). Should be 2-3 hours!
Performance impact is negligible with the JIT enabled. However, the stats are more interesting: if you go to the "Optimization (Tier 2) stats" section, you'll see that we're executing 1.7B fewer _GUARD_TYPE_VERSION instructions (a reduction of 25%).
I haven't dug much deeper than that, but the stats also seem to indicate that we're executing around 10% fewer traces (and micro-ops) in general. It isn't clear why this PR would cause that to happen... it could just be noise in the run, or possibly some second-order interaction due to the change.
Either way, I think the takeaway is: this seems to work correctly, but type version checks aren't really a significant source of overhead. I think it may still be worth doing, especially if we can use a similar approach to remove other checks too.
@markshannon, any thoughts here?
we're executing around 10% fewer traces (and micro-ops) in general. It isn't clear why this PR would cause that to happen...
I did notice that this PR breaks the hexiom and mdp benchmarks (they run on the base, but not with the changes in this PR). This kind of thing is totally expected for a big experimental change like this, but it might provide some clues about how it's not performing as expected. The logs just say "benchmark died", which usually means a segfault or something else that wouldn't produce a Python traceback.
Interesting! We’ll take a look today.
@mdboom thank you for flagging this! I was able to reproduce it locally with these two test files.
I notice it also segfaults on the new one https://github.com/python/cpython/pull/119365 so I will look into why and try to add a test case for it as well to the test suite in the new PR.
We seemed to fix this bug: https://github.com/python/cpython/pull/119365/commits/88e2e59d3225a59b7c52eb36d1230d03c696b326
It was very weird and likely a bug in main that was only highlighted with these changes.
I am closing this PR since it seems like we want to use the approach in https://github.com/python/cpython/pull/119365 to use watchers instead, which allows us to remove more guards and also opens up some future improvements that rely on type watching.