gh-142982: Specialize CALL_FUNCTION_EX
This adds a specialization for CALL_FUNCTION_EX for Python and non-Python frames.
It should be slightly faster on the interpreter, and much faster on the JIT. Previously the JIT could not trace through this operation, now it can.
JIT benchmark suggest 1.6% speedup on macOS AArch64, and 0% speedup on Linux https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20260102-3.15.0a3%2B-884a7a7-JIT/bm-20260102-macm4pro-arm64-Fidget%252dSpinner-call_function_ex_py-3.15.0a3%2B-884a7a7-vs-base.md
https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20260102-3.15.0a3%2B-884a7a7-JIT/bm-20260102-vultr-x86_64-Fidget%252dSpinner-call_function_ex_py-3.15.0a3%2B-884a7a7-vs-base.md
I've checked some of the macOS speedups, and they indeed previously were blocked by CALL_FUNCTION_EX in the JIT.
- Issue: gh-142982