trio icon indicating copy to clipboard operation
trio copied to clipboard

Pool WaitForSingleObject calls into minimal threads

Open richardsheridan opened this issue 3 years ago • 7 comments

Fixes #1271, but also optimizes WaitForSingleObject calls. This fulfills a wish I had in #1783 to reduce the number of threads needed during async subprocess usage on Windows.

UPDATE: The following was true until 783d093, but removed in a3a4f10. The current implementation is equivalent to "Pool per process". I preserved the benchmarking code here.

Right now it contains some awkward monkeypatching routines to select from multiple implementations on the fly for benchmarking. The new implementation options are:

  1. Win32: a simple wrapping of RegisterWaitForSingleObject and UnregisterWait as you suggested in #233. This is the lightest weight and most performant option, and used by asyncio, libuv and tokio (with the same wait flags, too). The objections to it are ffi.callback() breaks on some platforms (none of which are Windows), UnregisterWait is difficult to understand (by reimplementing it in Trio, I am pretty comfortable answering any questions), and we don't control the threads or thread pool (valid, and the pool has a default limit of 500 threads which could bite at any time).
  2. Pool per run: a WaitPool chops up wait requests into WaitGroups which implicitly have-a (unstructured) background Trio task that jumps in and out of the Trio thread cache to alternately wait and drain signaled handles from the group. The largest not-full WaitGroup is identified using a SortedList so that we minimize the number of WaitGroups under normal circumstances. WaitPool does not attempt to merge WaitGroups at any point, the handles simply drain as completed. Benchmarking shows this has significantly higher high-water-mark memory usage and lower throughput compared to...
  3. Pool per process: the same WaitPool logic makes WaitGroups which correspond to background threads from the Trio cache but otherwise behave identically to the above tasks. Compared to Win32, this method is about 50% slower and heavier in terms of throughput and high water mark memory respectively, but compared to pooling per run with tasks, it is better by a factor of 2 or 3. Everything has to be carefully locked, but the lock contention shouldn't be much worse than the event-loop-contention of pool per run, unless there are many many separate threads with Trio runs all waiting on handles.

Also included is a benchmark script in the notes-to-self folder. Even if the pooling options turn out to be not desirable for Trio, I would propose rolling the implementations into the benchmark script directly so that if someone someday does turn out to be memory-limited by the current thread-per-handle approach, there will be a quick place to go with drop-in replacements available.

richardsheridan avatar Dec 24 '20 16:12 richardsheridan

Codecov Report

Merging #1834 (7e6f90a) into master (ce4c99e) will increase coverage by 0.91%. The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1834      +/-   ##
==========================================
+ Coverage   98.06%   98.98%   +0.91%     
==========================================
  Files         117      117              
  Lines       16119    16227     +108     
  Branches     3122     3143      +21     
==========================================
+ Hits        15807    16062     +255     
+ Misses        252      116     -136     
+ Partials       60       49      -11     
Impacted Files Coverage Δ
trio/_core/_windows_cffi.py 100.00% <100.00%> (ø)
trio/_wait_for_object.py 100.00% <100.00%> (ø)
trio/tests/test_wait_for_object.py 100.00% <100.00%> (ø)
trio/_core/_run.py 100.00% <0.00%> (+0.44%) :arrow_up:
trio/tests/test_ssl.py 99.86% <0.00%> (+0.55%) :arrow_up:
trio/tests/test_socket.py 100.00% <0.00%> (+0.65%) :arrow_up:
trio/_core/tests/test_io.py 100.00% <0.00%> (+1.05%) :arrow_up:
trio/tests/test_highlevel_open_tcp_stream.py 100.00% <0.00%> (+1.47%) :arrow_up:
trio/_subprocess_platform/__init__.py 100.00% <0.00%> (+5.71%) :arrow_up:
trio/_highlevel_ssl_helpers.py 100.00% <0.00%> (+11.76%) :arrow_up:
... and 4 more

codecov[bot] avatar Dec 24 '20 16:12 codecov[bot]

This PR is dramatically streamlined as of a3a4f10, so if @pquentin, @njsmith or @oremanj ever wanted to do a deep read of a "finished product", it is available now!

richardsheridan avatar Jun 15 '21 13:06 richardsheridan

Any news?

honglei avatar Jul 05 '22 01:07 honglei

To be honest, I didn't think anyone was watching this one. Do you have a specific use-case? It might add some motivation for someone to (do an easy rebase and) merge the feature.

richardsheridan avatar Jul 05 '22 01:07 richardsheridan

@richardsheridan I was directed to this issue from https://github.com/python-trio/trio/issues/1271 , so the problom was solved in #1271 right now?

honglei avatar Jul 05 '22 01:07 honglei

Yes, this PR surely fixes #1271, as each wait group thread cleans up its own wakeup event (equivalent to cancel handle from the current implementation). I think the hesitation to merge comes from the possibility that some other subtle bug may be introduced from the extra complexity.

richardsheridan avatar Jul 05 '22 02:07 richardsheridan

I merged the master branch. But I'm only the janitor, I don't know the internals of Trio well enough to help here, sorry!

pquentin avatar Jul 05 '22 05:07 pquentin