JIT coverage can be improved considerably
JIT coverage is still quite variable. For some benchmarks, e.g. richards it is close to 100%. For others, much lower:
https://github.com/savannahostrowski/pyperf_bench/blob/main/profiling/jit.svg
Note: A fair bit of the time attributed to the interpreter is frame cleanup, so makes the interpreter fractions look larger than they are.
We can increase JIT coverage, by:
- Treating dynamic exits, like side exits: warming up and compiling a new, side, trace.
- Adding possible jit entry points at function entries, as well as backward edges.
Linked PRs
- gh-143391
For context, we increased JIT coverage by 50% just with trace recording alone. This was the old chart prior to trace recording https://github.com/faster-cpython/benchmarking-public/blob/main/profiling/jit.svg
@markshannon which one have you decided to work on? I can take up this issue if you aren't too far in yet, as I don't want to touch the JIT assembly parser in https://github.com/python/cpython/issues/143158 :).
Seems like tracing from RESUME doesn't help much. Considering the complexity, it might not be worth to implement it for RESUME:
0.2% faster on Linux s86-64: https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251231-3.15.0a3%2B-564677c-JIT/bm-20251231-vultr-x86_64-Fidget%252dSpinner-resume_tracing-3.15.0a3%2B-564677c-vs-base.md
0.4% slower on macOS AArch64: https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251231-3.15.0a3%2B-564677c-JIT/bm-20251231-macm4pro-arm64-Fidget%252dSpinner-resume_tracing-3.15.0a3%2B-564677c-vs-base.md
In the last pystats, CALL_FUNCTION_EX was the 3rd highest call op that called Python frames after CALL_PY_EXACT_ARGS and CALL_ALLOC_AND_ENTER_INIT https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251019-3.15.0a1%2B-bedaea0-JIT/bm-20251019-vultr-x86_64-python-bedaea05987738c4c6b9-3.15.0a1%2B-bedaea0-pystats.md
We should specialize for it.
Things we need to specialize for:
- CALL_FUNCTION_EX
- SEND
- CALL_ALLOC_AND_ENTER_INIT_SLOT
SEND has one of the worst specialization failures of any opcode https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251019-3.15.0a1%2B-bedaea0-JIT/bm-20251019-vultr-x86_64-python-bedaea05987738c4c6b9-3.15.0a1%2B-bedaea0-pystats.md#send-1
This should also speed up the interpreter slightly. It's blocking us optimizing generators/frames in the JIT.