trio
trio copied to clipboard
Pool WaitForSingleObject calls into minimal threads
Fixes #1271, but also optimizes WaitForSingleObject
calls. This fulfills a wish I had in #1783 to reduce the number of threads needed during async subprocess usage on Windows.
UPDATE: The following was true until 783d093, but removed in a3a4f10. The current implementation is equivalent to "Pool per process". I preserved the benchmarking code here.
Right now it contains some awkward monkeypatching routines to select from multiple implementations on the fly for benchmarking. The new implementation options are:
- Win32: a simple wrapping of
RegisterWaitForSingleObject
andUnregisterWait
as you suggested in #233. This is the lightest weight and most performant option, and used by asyncio, libuv and tokio (with the same wait flags, too). The objections to it areffi.callback()
breaks on some platforms (none of which are Windows),UnregisterWait
is difficult to understand (by reimplementing it in Trio, I am pretty comfortable answering any questions), and we don't control the threads or thread pool (valid, and the pool has a default limit of 500 threads which could bite at any time). - Pool per run: a
WaitPool
chops up wait requests intoWaitGroup
s which implicitly have-a (unstructured) background Trio task that jumps in and out of the Trio thread cache to alternately wait and drain signaled handles from the group. The largest not-fullWaitGroup
is identified using aSortedList
so that we minimize the number ofWaitGroup
s under normal circumstances.WaitPool
does not attempt to mergeWaitGroup
s at any point, the handles simply drain as completed. Benchmarking shows this has significantly higher high-water-mark memory usage and lower throughput compared to... - Pool per process: the same
WaitPool
logic makesWaitGroup
s which correspond to background threads from the Trio cache but otherwise behave identically to the above tasks. Compared to Win32, this method is about 50% slower and heavier in terms of throughput and high water mark memory respectively, but compared to pooling per run with tasks, it is better by a factor of 2 or 3. Everything has to be carefully locked, but the lock contention shouldn't be much worse than the event-loop-contention of pool per run, unless there are many many separate threads with Trio runs all waiting on handles.
Also included is a benchmark script in the notes-to-self folder. Even if the pooling options turn out to be not desirable for Trio, I would propose rolling the implementations into the benchmark script directly so that if someone someday does turn out to be memory-limited by the current thread-per-handle approach, there will be a quick place to go with drop-in replacements available.
Codecov Report
Merging #1834 (7e6f90a) into master (ce4c99e) will increase coverage by
0.91%
. The diff coverage is100.00%
.
Additional details and impacted files
@@ Coverage Diff @@
## master #1834 +/- ##
==========================================
+ Coverage 98.06% 98.98% +0.91%
==========================================
Files 117 117
Lines 16119 16227 +108
Branches 3122 3143 +21
==========================================
+ Hits 15807 16062 +255
+ Misses 252 116 -136
+ Partials 60 49 -11
Impacted Files | Coverage Δ | |
---|---|---|
trio/_core/_windows_cffi.py | 100.00% <100.00%> (ø) |
|
trio/_wait_for_object.py | 100.00% <100.00%> (ø) |
|
trio/tests/test_wait_for_object.py | 100.00% <100.00%> (ø) |
|
trio/_core/_run.py | 100.00% <0.00%> (+0.44%) |
:arrow_up: |
trio/tests/test_ssl.py | 99.86% <0.00%> (+0.55%) |
:arrow_up: |
trio/tests/test_socket.py | 100.00% <0.00%> (+0.65%) |
:arrow_up: |
trio/_core/tests/test_io.py | 100.00% <0.00%> (+1.05%) |
:arrow_up: |
trio/tests/test_highlevel_open_tcp_stream.py | 100.00% <0.00%> (+1.47%) |
:arrow_up: |
trio/_subprocess_platform/__init__.py | 100.00% <0.00%> (+5.71%) |
:arrow_up: |
trio/_highlevel_ssl_helpers.py | 100.00% <0.00%> (+11.76%) |
:arrow_up: |
... and 4 more |
This PR is dramatically streamlined as of a3a4f10, so if @pquentin, @njsmith or @oremanj ever wanted to do a deep read of a "finished product", it is available now!
Any news?
To be honest, I didn't think anyone was watching this one. Do you have a specific use-case? It might add some motivation for someone to (do an easy rebase and) merge the feature.
@richardsheridan I was directed to this issue from https://github.com/python-trio/trio/issues/1271 , so the problom was solved in #1271 right now?
Yes, this PR surely fixes #1271, as each wait group thread cleans up its own wakeup event (equivalent to cancel handle from the current implementation). I think the hesitation to merge comes from the possibility that some other subtle bug may be introduced from the extra complexity.
I merged the master branch. But I'm only the janitor, I don't know the internals of Trio well enough to help here, sorry!