cpython icon indicating copy to clipboard operation
cpython copied to clipboard

JIT coverage can be improved considerably

Open markshannon opened this issue 4 weeks ago • 5 comments

JIT coverage is still quite variable. For some benchmarks, e.g. richards it is close to 100%. For others, much lower:

https://github.com/savannahostrowski/pyperf_bench/blob/main/profiling/jit.svg

Note: A fair bit of the time attributed to the interpreter is frame cleanup, so makes the interpreter fractions look larger than they are.

We can increase JIT coverage, by:

  • Treating dynamic exits, like side exits: warming up and compiling a new, side, trace.
  • Adding possible jit entry points at function entries, as well as backward edges.

Linked PRs

  • gh-143391

markshannon avatar Dec 19 '25 15:12 markshannon

For context, we increased JIT coverage by 50% just with trace recording alone. This was the old chart prior to trace recording https://github.com/faster-cpython/benchmarking-public/blob/main/profiling/jit.svg

Fidget-Spinner avatar Dec 23 '25 12:12 Fidget-Spinner

@markshannon which one have you decided to work on? I can take up this issue if you aren't too far in yet, as I don't want to touch the JIT assembly parser in https://github.com/python/cpython/issues/143158 :).

Fidget-Spinner avatar Dec 25 '25 22:12 Fidget-Spinner

Seems like tracing from RESUME doesn't help much. Considering the complexity, it might not be worth to implement it for RESUME:

0.2% faster on Linux s86-64: https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251231-3.15.0a3%2B-564677c-JIT/bm-20251231-vultr-x86_64-Fidget%252dSpinner-resume_tracing-3.15.0a3%2B-564677c-vs-base.md

0.4% slower on macOS AArch64: https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251231-3.15.0a3%2B-564677c-JIT/bm-20251231-macm4pro-arm64-Fidget%252dSpinner-resume_tracing-3.15.0a3%2B-564677c-vs-base.md

Fidget-Spinner avatar Dec 31 '25 18:12 Fidget-Spinner

In the last pystats, CALL_FUNCTION_EX was the 3rd highest call op that called Python frames after CALL_PY_EXACT_ARGS and CALL_ALLOC_AND_ENTER_INIT https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251019-3.15.0a1%2B-bedaea0-JIT/bm-20251019-vultr-x86_64-python-bedaea05987738c4c6b9-3.15.0a1%2B-bedaea0-pystats.md

We should specialize for it.

Fidget-Spinner avatar Jan 02 '26 00:01 Fidget-Spinner

Things we need to specialize for:

  • CALL_FUNCTION_EX
  • SEND
  • CALL_ALLOC_AND_ENTER_INIT_SLOT

SEND has one of the worst specialization failures of any opcode https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251019-3.15.0a1%2B-bedaea0-JIT/bm-20251019-vultr-x86_64-python-bedaea05987738c4c6b9-3.15.0a1%2B-bedaea0-pystats.md#send-1

This should also speed up the interpreter slightly. It's blocking us optimizing generators/frames in the JIT.

Fidget-Spinner avatar Jan 03 '26 14:01 Fidget-Spinner