hpx
hpx copied to clipboard
Enable testing of new scheduler in CI
Performance test report
HPX Performance
Comparison
BENCHMARK | FORK_JOIN_EXECUTOR | PARALLEL_EXECUTOR | SCHEDULER_EXECUTOR |
---|---|---|---|
For Each | (=) | ?? | - |
Info
Property | Before | After |
---|---|---|
HPX Datetime | 2023-05-10T12:07:53+00:00 | 2024-01-31T16:49:40+00:00 |
HPX Commit | dcb541576898d370113946ba15fb58c20c8325b2 | b7a31347bef0566702c5f042e69175a0d691aaeb |
Datetime | 2023-05-10T14:50:18.616050-05:00 | 2024-01-31T10:55:40.712102-06:00 |
Envfile | ||
Hostname | medusa08.rostam.cct.lsu.edu | medusa08.rostam.cct.lsu.edu |
Compiler | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 |
Clustername | rostam | rostam |
Comparison
BENCHMARK | NO-EXECUTOR |
---|---|
Future Overhead - Create Thread Hierarchical - Latch | (=) |
Info
Property | Before | After |
---|---|---|
HPX Datetime | 2023-05-10T12:07:53+00:00 | 2024-01-31T16:49:40+00:00 |
HPX Commit | dcb541576898d370113946ba15fb58c20c8325b2 | b7a31347bef0566702c5f042e69175a0d691aaeb |
Datetime | 2023-05-10T14:52:35.047119-05:00 | 2024-01-31T10:57:55.614863-06:00 |
Envfile | ||
Hostname | medusa08.rostam.cct.lsu.edu | medusa08.rostam.cct.lsu.edu |
Compiler | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 |
Clustername | rostam | rostam |
Comparison
BENCHMARK | FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR | PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR | SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR |
---|---|---|---|
Stream Benchmark - Add | (=) | (=) | (=) |
Stream Benchmark - Scale | + | = | = |
Stream Benchmark - Triad | (=) | (=) | (=) |
Stream Benchmark - Copy | (=) | (=) | (=) |
Info
Property | Before | After |
---|---|---|
HPX Datetime | 2023-05-10T12:07:53+00:00 | 2024-01-31T16:49:40+00:00 |
HPX Commit | dcb541576898d370113946ba15fb58c20c8325b2 | b7a31347bef0566702c5f042e69175a0d691aaeb |
Datetime | 2023-05-10T14:52:52.237641-05:00 | 2024-01-31T10:58:12.811457-06:00 |
Envfile | ||
Hostname | medusa08.rostam.cct.lsu.edu | medusa08.rostam.cct.lsu.edu |
Compiler | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 |
Clustername | rostam | rostam |
Explanation of Symbols
Symbol | MEANING |
---|---|
= | No performance change (confidence interval within ±1%) |
(=) | Probably no performance change (confidence interval within ±2%) |
(+)/(-) | Very small performance improvement/degradation (≤1%) |
+/- | Small performance improvement/degradation (≤5%) |
++/-- | Large performance improvement/degradation (≤10%) |
+++/--- | Very large performance improvement/degradation (>10%) |
? | Probably no change, but quite large uncertainty (confidence interval with ±5%) |
?? | Unclear result, very large uncertainty (±10%) |
??? | Something unexpected… |
Coverage summary from Codacy
See diff coverage on Codacy
Coverage variation | Diff coverage |
---|---|
:white_check_mark: -0.09% | :white_check_mark: ∅ |
Coverage variation details
Coverable lines | Covered lines | Coverage | |
---|---|---|---|
Common ancestor commit (9c53b99534f57cfa56336d835985faac18166fa1) | 206733 | 176197 | 85.23% |
Head commit (78626f0e56d365c03dece638afe29fcd310b371c) | 190772 (-15961) | 162415 (-13782) | 85.14% (-0.09%) |
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>
Diff coverage details
Coverable lines | Covered lines | Diff coverage | |
---|---|---|---|
Pull request (#6428) | 0 | 0 | ∅ (not applicable) |
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%
See your quality gate settings Change summary preferences
You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation
@Pansysk75 it looks like that the new scheduler doesn't properly work yet (as you suggested). the clang-12 and gcc-10 errors are most likely related.
retest
@hkaiser FYI I'm working on this one, I've reproduced some failing tests (stop_token_cb1
being one of them). So far, I see that some pending hpx thread is never getting scheduled (either starvation or some bug). I'll let you know of what I find out.
@hkaiser It seems like the moodycamel queue provides FIFO guarantees only among items placed by the same thread, however there is no guarantee that items queued by different threads will be dequeued in a FIFO order.
From the moodycamel blog:
All elements from a given producer thread will necessarily still be seen in that same order relative to each other when dequeued (since the sub-queue preserves that order), albeit with elements from other sub-queues possibly interleaved.
And from the discussion in the comments:
gasche: ... If thread A enqueues 1, then does a write observed by thread B (synchronized through the appropriate memory fence), then B enqueues 2, then I would expect 1 to come before 2 -- and this is not racy. It seems to me that your data-structure is thus providing a weaker specification than linearizable queues. ...
Cameron: Yes, you're right. If there's external synchronization, then the final order is not related to the total order formed by the happens-before relationships of the enqueue operations themselves. ...
This currently causes trouble (deadlocks), as we rely on the FIFO ordering to avoid starvation: https://github.com/STEllAR-GROUP/hpx/blob/bd9e3cbabe3d8402263e9bafec6c8f6d5de6048a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp#L325-L332
In the fail-case that I managed to reproduce, a perpetually yielding thread starves out a shared-resource-holding thread when these two happen to be in the same moodycamel thread-queue. We cannot reliably insert the yielding thread in the"back" of the queue (no FIFO guarantee), and the queue under certain circumstances acts similar to a LIFO, thus never allowing the shared-resource-holding thread to be picked up by the scheduler and release the held resource.
Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?
Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?
Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6
Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?
Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6
Would this patch mitigate the issue?
diff --git a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
index 1ca53941ee4..d2cc74fae4e 100644
--- a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
+++ b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
@@ -315,13 +315,6 @@ namespace hpx::threads::detail {
if (HPX_UNLIKELY(
state_val == thread_schedule_state::pending))
{
- if (HPX_LIKELY(next_thrd == nullptr))
- {
- // schedule other work
- scheduler.wait_or_add_new(num_thread, running,
- idle_loop_count, enable_stealing_staged, added);
- }
-
// schedule this thread again, make sure it ends up at
// the end of the queue
scheduler.SchedulingPolicy::schedule_thread_last(
@@ -329,6 +322,14 @@ namespace hpx::threads::detail {
threads::thread_schedule_hint(
static_cast<std::int16_t>(num_thread)),
true);
+
+ if (HPX_LIKELY(next_thrd == nullptr))
+ {
+ // schedule other work
+ scheduler.wait_or_add_new(num_thread, running,
+ idle_loop_count, enable_stealing_staged, added);
+ }
+
scheduler.SchedulingPolicy::do_some_work(num_thread);
}
else if (HPX_UNLIKELY(state_val ==
I think the problem is that currently after the pending thread has been put back onto the (top of) queue the scheduler immediately pulls it back and retries without potentially waiting threads having a change of being work-requested. For other schedulers those will get stolen eventually, thus avoiding the live-lock.
@hkaiser Doesn't seem to help much... keep in mind that this issue doesn't show up on the workrequesting scheduler when using the default queue, so it's either the mc queue or the stealhalf functionality that is causing the issue (I also tried setting this to false https://github.com/STEllAR-GROUP/hpx/blob/bd9e3cbabe3d8402263e9bafec6c8f6d5de6048a/libs/core/schedulers/include/hpx/schedulers/lockfree_queue_backends.hpp#L119 to see bulk dequeueing was the culprit, but it didn't fix things.
Also, I think this fix is needed (or sth equivalent):
task_data thrds(d.num_thread_);
- thrds.tasks_.reserve(max_num_to_steal);
#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+ thrds.tasks_.reserve(max_num_to_steal);
thread_id_ref_type thrd;
while (max_num_to_steal-- != 0 &&
d.queue_->get_next_thread(thrd, false, true))
{
d.queue_->increment_num_stolen_from_pending();
thrds.tasks_.push_back(HPX_MOVE(thrd));
thrd = thread_id_ref_type{};
}
#else
+ thrds.tasks_.resize(max_num_to_steal);
d.queue_->get_next_threads(
thrds.tasks_.begin(), thrds.tasks_.size(), false, true);
Also, I think this fix is needed (or sth equivalent):
task_data thrds(d.num_thread_); - thrds.tasks_.reserve(max_num_to_steal); #ifdef HPX_HAVE_THREAD_STEALING_COUNTS + thrds.tasks_.reserve(max_num_to_steal); thread_id_ref_type thrd; while (max_num_to_steal-- != 0 && d.queue_->get_next_thread(thrd, false, true)) { d.queue_->increment_num_stolen_from_pending(); thrds.tasks_.push_back(HPX_MOVE(thrd)); thrd = thread_id_ref_type{}; } #else + thrds.tasks_.resize(max_num_to_steal); d.queue_->get_next_threads( thrds.tasks_.begin(), thrds.tasks_.size(), false, true);
Uhh, how did it ever work? ;-)
Also, I think this fix is needed (or sth equivalent):
task_data thrds(d.num_thread_); - thrds.tasks_.reserve(max_num_to_steal); #ifdef HPX_HAVE_THREAD_STEALING_COUNTS + thrds.tasks_.reserve(max_num_to_steal); thread_id_ref_type thrd; while (max_num_to_steal-- != 0 && d.queue_->get_next_thread(thrd, false, true)) { d.queue_->increment_num_stolen_from_pending(); thrds.tasks_.push_back(HPX_MOVE(thrd)); thrd = thread_id_ref_type{}; } #else + thrds.tasks_.resize(max_num_to_steal); d.queue_->get_next_threads( thrds.tasks_.begin(), thrds.tasks_.size(), false, true);
Uhh, how did it ever work? ;-)
That's fixed now. Thanks again!
Coverage summary from Codacy
See diff coverage on Codacy
Coverage variation | Diff coverage |
---|---|
:white_check_mark: +0.04% | :white_check_mark: 0.00% |
Coverage variation details
Coverable lines | Covered lines | Coverage | |
---|---|---|---|
Common ancestor commit (e977ecc639ae967fe8f104fb384a5a7a37a6ff52) | 217975 | 185525 | 85.11% |
Head commit (2e610e7a251bdf6955a85ee5ef9df03994e343c3) | 190892 (-27083) | 162547 (-22978) | 85.15% (+0.04%) |
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>
Diff coverage details
Coverable lines | Covered lines | Diff coverage | |
---|---|---|---|
Pull request (#6428) | 1 | 0 | 0.00% |
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%
See your quality gate settings Change summary preferences
Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more
Performance test report
HPX Performance
Comparison
BENCHMARK | FORK_JOIN_EXECUTOR | PARALLEL_EXECUTOR | SCHEDULER_EXECUTOR |
---|---|---|---|
For Each | (=) | - | - |
Info
Property | Before | After |
---|---|---|
HPX Commit | d27ac2efb17ed7a61d0417a1926ba8991f4f477f | 0e8959a9591ba78173df4105063007b7db08c35e |
HPX Datetime | 2024-03-18T14:00:30+00:00 | 2024-04-28T16:28:37+00:00 |
Hostname | medusa08.rostam.cct.lsu.edu | medusa08.rostam.cct.lsu.edu |
Compiler | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 |
Envfile | ||
Clustername | rostam | rostam |
Datetime | 2024-03-18T09:18:04.949759-05:00 | 2024-04-28T11:40:07.559807-05:00 |
Comparison
BENCHMARK | NO-EXECUTOR |
---|---|
Future Overhead - Create Thread Hierarchical - Latch | (=) |
Info
Property | Before | After |
---|---|---|
HPX Commit | d27ac2efb17ed7a61d0417a1926ba8991f4f477f | 0e8959a9591ba78173df4105063007b7db08c35e |
HPX Datetime | 2024-03-18T14:00:30+00:00 | 2024-04-28T16:28:37+00:00 |
Hostname | medusa08.rostam.cct.lsu.edu | medusa08.rostam.cct.lsu.edu |
Compiler | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 |
Envfile | ||
Clustername | rostam | rostam |
Datetime | 2024-03-18T09:19:53.062988-05:00 | 2024-04-28T11:41:54.962060-05:00 |
Comparison
BENCHMARK | FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR | PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR | SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR |
---|---|---|---|
Stream Benchmark - Add | (=) | (=) | (=) |
Stream Benchmark - Scale | (=) | (=) | (=) |
Stream Benchmark - Triad | = | (=) | (=) |
Stream Benchmark - Copy | (=) | (=) | (=) |
Info
Property | Before | After |
---|---|---|
HPX Commit | d27ac2efb17ed7a61d0417a1926ba8991f4f477f | 0e8959a9591ba78173df4105063007b7db08c35e |
HPX Datetime | 2024-03-18T14:00:30+00:00 | 2024-04-28T16:28:37+00:00 |
Hostname | medusa08.rostam.cct.lsu.edu | medusa08.rostam.cct.lsu.edu |
Compiler | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 | /opt/apps/llvm/13.0.1/bin/clang++ 13.0.1 |
Envfile | ||
Clustername | rostam | rostam |
Datetime | 2024-03-18T09:20:13.002391-05:00 | 2024-04-28T11:42:13.137113-05:00 |
Explanation of Symbols
Symbol | MEANING |
---|---|
= | No performance change (confidence interval within ±1%) |
(=) | Probably no performance change (confidence interval within ±2%) |
(+)/(-) | Very small performance improvement/degradation (≤1%) |
+/- | Small performance improvement/degradation (≤5%) |
++/-- | Large performance improvement/degradation (≤10%) |
+++/--- | Very large performance improvement/degradation (>10%) |
? | Probably no change, but quite large uncertainty (confidence interval with ±5%) |
?? | Unclear result, very large uncertainty (±10%) |
??? | Something unexpected… |