async-executor
async-executor copied to clipboard
Replace the local worker queues with st3's
Fix #32. This PR replaces the fixed-sized local worker queues with st3's implementation. The implementation in the crate itself is largely the same, but st3's implementation should use considerably fewer atomic operations.
Performance wise, this seems to provide a major performance improvement across the board, particularly for single threaded cases, since pushing no longer requires any atomic operations, and popping only requires one. This does contain one major performance regression with multi_thread/executor::spawn_one, and it's unclear why that's the case. My current working theory is that the atomic-free push to local queues is putting the global queue under higher contention.
executor::create time: [725.66 ns 726.23 ns 726.98 ns]
change: [-32.237% -31.965% -31.775%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
single_thread/executor::spawn_one
time: [923.93 ns 936.62 ns 950.73 ns]
change: [-37.180% -33.978% -30.511%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
11 (11.00%) high mild
single_thread/executor::spawn_batch
time: [34.023 µs 36.536 µs 39.778 µs]
change: [+22.133% +34.229% +44.880%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
single_thread/executor::spawn_many_local
time: [4.6916 ms 4.7248 ms 4.7611 ms]
change: [-3.5935% -2.6420% -1.6484%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
single_thread/executor::spawn_recursively
time: [35.261 ms 35.584 ms 35.936 ms]
change: [-26.953% -26.069% -25.097%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
single_thread/executor::yield_now
time: [5.4241 ms 5.4290 ms 5.4344 ms]
change: [-10.452% -9.0866% -7.9914%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
multi_thread/executor::spawn_one
time: [14.511 µs 14.882 µs 15.177 µs]
change: [+674.99% +725.04% +767.31%] (p = 0.00 < 0.05)
Performance has regressed.
multi_thread/executor::spawn_batch
time: [53.164 µs 58.006 µs 63.134 µs]
change: [-19.348% -12.758% -5.2161%] (p = 0.00 < 0.05)
Performance has improved.
multi_thread/executor::spawn_many_local
time: [27.513 ms 27.608 ms 27.705 ms]
change: [+1.8542% +2.4549% +3.0788%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.7s, or reduce sample count to 20.
multi_thread/executor::spawn_recursively
time: [174.31 ms 174.66 ms 175.04 ms]
change: [-1.8165% -1.5293% -1.2438%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe
multi_thread/executor::yield_now
time: [23.860 ms 23.931 ms 23.996 ms]
change: [-1.8530% -1.4776% -1.0672%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
5 (5.00%) low mild
single_thread/static_executor::spawn_one
time: [671.98 ns 680.60 ns 689.94 ns]
change: [-53.423% -50.975% -48.005%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
single_thread/static_executor::spawn_many_local
time: [4.4846 ms 4.5148 ms 4.5500 ms]
change: [-11.369% -10.414% -9.3355%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
single_thread/static_executor::spawn_recursively
time: [24.470 ms 24.599 ms 24.738 ms]
change: [-6.6356% -5.5074% -4.4066%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
single_thread/static_executor::yield_now
time: [5.3366 ms 5.3424 ms 5.3486 ms]
change: [-6.6970% -6.4629% -6.2391%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
multi_thread/static_executor::spawn_one
time: [13.876 µs 14.197 µs 14.443 µs]
change: [+704.49% +755.40% +805.04%] (p = 0.00 < 0.05)
Performance has regressed.
multi_thread/static_executor::spawn_many_local
time: [4.5342 ms 4.6639 ms 4.7927 ms]
change: [-21.861% -19.142% -16.242%] (p = 0.00 < 0.05)
Performance has improved.
multi_thread/static_executor::spawn_recursively
time: [43.786 ms 44.057 ms 44.264 ms]
change: [-0.6134% -0.0025% +0.5506%] (p = 1.00 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low severe
1 (1.00%) low mild
multi_thread/static_executor::yield_now
time: [23.979 ms 24.052 ms 24.121 ms]
change: [+0.4275% +0.8193% +1.1661%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
4 (4.00%) low mild
This is a breaking change. It makes the future returned by Executor::run no longer Send or Sync.