gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP`
This PR implements the foundational work necessary for making the specializing interpreter thread-safe in free-threaded builds and enables specialization for BINARY_OP as an end-to-end example. To enable future incremental work, specialization can now be toggled on a per-family basis. Subsequent PRs will enable specialization in free-threaded builds for the remaining families.
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
- Issue: gh-115999
Woops I missed this! I'll review it soon-ish (next week).
!buildbot nogil refleak
:robot: New build scheduled with the buildbot fleet by @Fidget-Spinner for commit aa330b1af21688aa3a81bb41a249f9d7c572aa79 :robot:
The command will test the builders whose names match following regular expression: nogil refleak
The builders matched are:
PPC64LE Fedora Rawhide NoGIL refleaks PRAMD64 CentOS9 NoGIL Refleaks PRaarch64 Fedora Rawhide NoGIL refleaks PRAMD64 Fedora Rawhide NoGIL refleaks PR
@mpage I think this looks good overall. However, I will leave it open for 2 more weeks in case anyone wants to leave a review. Please ping me in 2 weeks again for a final review and merge (hopefully after).
Refleak tests fail but IIRC I think they fail on main too?
Refleak tests fail but IIRC I think they fail on main too?
Hmm, I thought I ran nogil refleaks locally. I'll take a look.
I don't think the failures in the refleak tests are related to my changes. https://buildbot.python.org/#/builders/1610/builds/60 has the same failures and is from main.
I think there was discussion on Discord that concluded the refleak failures are related to functools.Placeholder. Victor fixed in #124601, so might be worth merging in main and trying again.
!buildbot nogil refleak
:robot: New build scheduled with the buildbot fleet by @mpage for commit 7dfd1ca3b4261f7b79e3cf2ac6fe2caf11dac87b :robot:
The command will test the builders whose names match following regular expression: nogil refleak
The builders matched are:
PPC64LE Fedora Rawhide NoGIL refleaks PRAMD64 CentOS9 NoGIL Refleaks PRaarch64 Fedora Rawhide NoGIL refleaks PRAMD64 Fedora Rawhide NoGIL refleaks PR
!buildbot nogil refleak
:robot: New build scheduled with the buildbot fleet by @mpage for commit ad180d183c9f1bbe6689df21d35a4f1375574ef4 :robot:
The command will test the builders whose names match following regular expression: nogil refleak
The builders matched are:
AMD64 Fedora Rawhide NoGIL refleaks PRaarch64 Fedora Rawhide NoGIL refleaks PRAMD64 CentOS9 NoGIL Refleaks PRPPC64LE Fedora Rawhide NoGIL refleaks PR
!buildbot nogil refleak
:robot: New build scheduled with the buildbot fleet by @mpage for commit b6380de9f4aee1b98fbc8590b19df78fd484cc81 :robot:
The command will test the builders whose names match following regular expression: nogil refleak
The builders matched are:
PPC64LE Fedora Rawhide NoGIL refleaks PRAMD64 Fedora Rawhide NoGIL refleaks PRaarch64 Fedora Rawhide NoGIL refleaks PRAMD64 CentOS9 NoGIL Refleaks PR
I think the failing refleak job is unrelated to this change. It looks like the same issue is happening intermittently on main (e.g. 1, 2, 3)
@Fidget-Spinner - It's been two-ish weeks. Would you please give this a final review?
@mpage Looks like you should resolve the conflict before Ken merges this PR.
I will wait a few days and hopefully Matt is able to merge it himself :)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.
Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.
Here are the performance numbers
Relative to the free-threading main, it is 1% faster using 2% more memory.
I've tried a couple of toy benchmarks with 1000 threads, just to check for pathological cases.
Performance is no worse, nor any better than main. Release builds, but no --lto or --pgo. Results
If we have a 1000 threads could we get a 1000 copies of the bytecode?
(Comment about general approach) Even if that is true, I can not see any disadvantage when compared to the subinterpreter model because if we spawn 1000 subinterpters, it already happens.
Is there any limit to the number of copies of the bytecode? If we have a 1000 threads could we get a 1000 copies of the bytecode?
Yes to both, for now. We have ideas for how to mitigate this in the future, if it becomes a problem (e.g. providing a limit for the total amount of memory consumed by bytecode copies, coalescing ~identical copies, or doing something JIT-like where we let things warm-up, then pick one copy of the bytecode to use, but disable future specialization for the code object). I think this is good enough to start with and we can iterate on it.
I have made the requested changes; please review again
Thanks for making the requested changes!
@Fidget-Spinner, @corona10, @markshannon: please review the changes made to this pull request.
@markshannon - I think I've addressed all of your comments. Would you please have a final look?
@markshannon - Would you take a look at this, please?
I'm still concerned about not counting the tlbc memory blocks in the refleaks test.
Maybe you could count them separately, and still check that there aren't too many leaked, but be a bit more relaxed about the counts for tlbc than for other blocks?
!buildbot nogil refleak
:robot: New build scheduled with the buildbot fleet by @mpage for commit 07f91409a6a4a933b26b23eb79e41da4d6c77918 :robot:
The command will test the builders whose names match following regular expression: nogil refleak
The builders matched are:
AMD64 CentOS9 NoGIL Refleaks PRAMD64 Fedora Rawhide NoGIL refleaks PRaarch64 Fedora Rawhide NoGIL refleaks PRPPC64LE Fedora Rawhide NoGIL refleaks PR