cpython gh-115999: Implement thread-local bytecode and enable specialization for `BINARY

This PR implements the foundational work necessary for making the specializing interpreter thread-safe in free-threaded builds and enables specialization for BINARY_OP as an end-to-end example. To enable future incremental work, specialization can now be toggled on a per-family basis. Subsequent PRs will enable specialization in free-threaded builds for the remaining families.

Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.

Issue: gh-115999

Sep 10 '24 22:09 mpage

Woops I missed this! I'll review it soon-ish (next week).

Sep 19 '24 04:09 Fidget-Spinner

!buildbot nogil refleak

Sep 26 '24 12:09 Fidget-Spinner

:robot: New build scheduled with the buildbot fleet by @Fidget-Spinner for commit aa330b1af21688aa3a81bb41a249f9d7c572aa79 :robot:

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

PPC64LE Fedora Rawhide NoGIL refleaks PR
AMD64 CentOS9 NoGIL Refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR

Sep 26 '24 12:09 bedevere-bot

@mpage I think this looks good overall. However, I will leave it open for 2 more weeks in case anyone wants to leave a review. Please ping me in 2 weeks again for a final review and merge (hopefully after).

Sep 26 '24 12:09 Fidget-Spinner

Refleak tests fail but IIRC I think they fail on main too?

Sep 26 '24 17:09 Fidget-Spinner

Refleak tests fail but IIRC I think they fail on main too?

Hmm, I thought I ran nogil refleaks locally. I'll take a look.

Sep 26 '24 18:09 mpage

I don't think the failures in the refleak tests are related to my changes. https://buildbot.python.org/#/builders/1610/builds/60 has the same failures and is from main.

Sep 26 '24 18:09 mpage

I think there was discussion on Discord that concluded the refleak failures are related to functools.Placeholder. Victor fixed in #124601, so might be worth merging in main and trying again.

Sep 26 '24 18:09 JelleZijlstra

!buildbot nogil refleak

Sep 26 '24 19:09 mpage

:robot: New build scheduled with the buildbot fleet by @mpage for commit 7dfd1ca3b4261f7b79e3cf2ac6fe2caf11dac87b :robot:

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

PPC64LE Fedora Rawhide NoGIL refleaks PR
AMD64 CentOS9 NoGIL Refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR

Sep 26 '24 19:09 bedevere-bot

!buildbot nogil refleak

Sep 28 '24 16:09 mpage

:robot: New build scheduled with the buildbot fleet by @mpage for commit ad180d183c9f1bbe6689df21d35a4f1375574ef4 :robot:

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

AMD64 Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
AMD64 CentOS9 NoGIL Refleaks PR
PPC64LE Fedora Rawhide NoGIL refleaks PR

Sep 28 '24 16:09 bedevere-bot

!buildbot nogil refleak

Sep 30 '24 19:09 mpage

:robot: New build scheduled with the buildbot fleet by @mpage for commit b6380de9f4aee1b98fbc8590b19df78fd484cc81 :robot:

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

PPC64LE Fedora Rawhide NoGIL refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
AMD64 CentOS9 NoGIL Refleaks PR

Sep 30 '24 19:09 bedevere-bot

I think the failing refleak job is unrelated to this change. It looks like the same issue is happening intermittently on main (e.g. 1, 2, 3)

Oct 01 '24 01:10 mpage

@Fidget-Spinner - It's been two-ish weeks. Would you please give this a final review?

Oct 04 '24 18:10 mpage

@mpage Looks like you should resolve the conflict before Ken merges this PR.

Oct 05 '24 02:10 corona10

I will wait a few days and hopefully Matt is able to merge it himself :)

Oct 06 '24 05:10 Fidget-Spinner

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Oct 07 '24 12:10 bedevere-app[bot]

Here are the performance numbers

Relative to the free-threading main, it is 1% faster using 2% more memory.

Oct 07 '24 14:10 markshannon

I've tried a couple of toy benchmarks with 1000 threads, just to check for pathological cases. Performance is no worse, nor any better than main. Release builds, but no --lto or --pgo. Results

Oct 07 '24 14:10 markshannon

If we have a 1000 threads could we get a 1000 copies of the bytecode?

(Comment about general approach) Even if that is true, I can not see any disadvantage when compared to the subinterpreter model because if we spawn 1000 subinterpters, it already happens.

Oct 09 '24 03:10 corona10

Is there any limit to the number of copies of the bytecode? If we have a 1000 threads could we get a 1000 copies of the bytecode?

Yes to both, for now. We have ideas for how to mitigate this in the future, if it becomes a problem (e.g. providing a limit for the total amount of memory consumed by bytecode copies, coalescing ~identical copies, or doing something JIT-like where we let things warm-up, then pick one copy of the bytecode to use, but disable future specialization for the code object). I think this is good enough to start with and we can iterate on it.

Oct 10 '24 06:10 mpage

I have made the requested changes; please review again

Oct 14 '24 06:10 mpage

Thanks for making the requested changes!

@Fidget-Spinner, @corona10, @markshannon: please review the changes made to this pull request.

Oct 14 '24 06:10 bedevere-app[bot]

@markshannon - I think I've addressed all of your comments. Would you please have a final look?

Oct 14 '24 21:10 mpage

@markshannon - Would you take a look at this, please?

Oct 22 '24 00:10 mpage

I'm still concerned about not counting the tlbc memory blocks in the refleaks test.

Maybe you could count them separately, and still check that there aren't too many leaked, but be a bit more relaxed about the counts for tlbc than for other blocks?

Oct 23 '24 16:10 markshannon

!buildbot nogil refleak

Oct 24 '24 04:10 mpage

:robot: New build scheduled with the buildbot fleet by @mpage for commit 07f91409a6a4a933b26b23eb79e41da4d6c77918 :robot:

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

AMD64 CentOS9 NoGIL Refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
PPC64LE Fedora Rawhide NoGIL refleaks PR

Oct 24 '24 04:10 bedevere-bot

gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP`