hpx icon indicating copy to clipboard operation
hpx copied to clipboard

Enable testing of new scheduler in CI

Open hkaiser opened this issue 1 year ago • 14 comments

hkaiser avatar Jan 31 '24 16:01 hkaiser

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)??-

Info

PropertyBeforeAfter
HPX Datetime2023-05-10T12:07:53+00:002024-01-31T16:49:40+00:00
HPX Commitdcb541576898d370113946ba15fb58c20c8325b2b7a31347bef0566702c5f042e69175a0d691aaeb
Datetime2023-05-10T14:50:18.616050-05:002024-01-31T10:55:40.712102-06:00
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch(=)

Info

PropertyBeforeAfter
HPX Datetime2023-05-10T12:07:53+00:002024-01-31T16:49:40+00:00
HPX Commitdcb541576898d370113946ba15fb58c20c8325b2b7a31347bef0566702c5f042e69175a0d691aaeb
Datetime2023-05-10T14:52:35.047119-05:002024-01-31T10:57:55.614863-06:00
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale+==
Stream Benchmark - Triad(=)(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2023-05-10T12:07:53+00:002024-01-31T16:49:40+00:00
HPX Commitdcb541576898d370113946ba15fb58c20c8325b2b7a31347bef0566702c5f042e69175a0d691aaeb
Datetime2023-05-10T14:52:52.237641-05:002024-01-31T10:58:12.811457-06:00
Envfile
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clusternamerostamrostam

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

StellarBot avatar Jan 31 '24 16:01 StellarBot

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:white_check_mark: -0.09% :white_check_mark:
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (9c53b99534f57cfa56336d835985faac18166fa1) 206733 176197 85.23%
Head commit (78626f0e56d365c03dece638afe29fcd310b371c) 190772 (-15961) 162415 (-13782) 85.14% (-0.09%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6428) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

codacy-production[bot] avatar Jan 31 '24 21:01 codacy-production[bot]

@Pansysk75 it looks like that the new scheduler doesn't properly work yet (as you suggested). the clang-12 and gcc-10 errors are most likely related.

hkaiser avatar Feb 05 '24 16:02 hkaiser

retest

Pansysk75 avatar Feb 07 '24 18:02 Pansysk75

@hkaiser FYI I'm working on this one, I've reproduced some failing tests (stop_token_cb1 being one of them). So far, I see that some pending hpx thread is never getting scheduled (either starvation or some bug). I'll let you know of what I find out.

Pansysk75 avatar Feb 16 '24 19:02 Pansysk75

@hkaiser It seems like the moodycamel queue provides FIFO guarantees only among items placed by the same thread, however there is no guarantee that items queued by different threads will be dequeued in a FIFO order.

From the moodycamel blog:

All elements from a given producer thread will necessarily still be seen in that same order relative to each other when dequeued (since the sub-queue preserves that order), albeit with elements from other sub-queues possibly interleaved.

And from the discussion in the comments:

gasche: ... If thread A enqueues 1, then does a write observed by thread B (synchronized through the appropriate memory fence), then B enqueues 2, then I would expect 1 to come before 2 -- and this is not racy. It seems to me that your data-structure is thus providing a weaker specification than linearizable queues. ...

Cameron: Yes, you're right. If there's external synchronization, then the final order is not related to the total order formed by the happens-before relationships of the enqueue operations themselves. ...

This currently causes trouble (deadlocks), as we rely on the FIFO ordering to avoid starvation: https://github.com/STEllAR-GROUP/hpx/blob/bd9e3cbabe3d8402263e9bafec6c8f6d5de6048a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp#L325-L332

In the fail-case that I managed to reproduce, a perpetually yielding thread starves out a shared-resource-holding thread when these two happen to be in the same moodycamel thread-queue. We cannot reliably insert the yielding thread in the"back" of the queue (no FIFO guarantee), and the queue under certain circumstances acts similar to a LIFO, thus never allowing the shared-resource-holding thread to be picked up by the scheduler and release the held resource.

Pansysk75 avatar Feb 17 '24 12:02 Pansysk75

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

hkaiser avatar Feb 17 '24 14:02 hkaiser

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

Pansysk75 avatar Feb 17 '24 15:02 Pansysk75

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

Would this patch mitigate the issue?

diff --git a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
index 1ca53941ee4..d2cc74fae4e 100644
--- a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
+++ b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
@@ -315,13 +315,6 @@ namespace hpx::threads::detail {
                     if (HPX_UNLIKELY(
                             state_val == thread_schedule_state::pending))
                     {
-                        if (HPX_LIKELY(next_thrd == nullptr))
-                        {
-                            // schedule other work
-                            scheduler.wait_or_add_new(num_thread, running,
-                                idle_loop_count, enable_stealing_staged, added);
-                        }
-
                         // schedule this thread again, make sure it ends up at
                         // the end of the queue
                         scheduler.SchedulingPolicy::schedule_thread_last(
@@ -329,6 +322,14 @@ namespace hpx::threads::detail {
                             threads::thread_schedule_hint(
                                 static_cast<std::int16_t>(num_thread)),
                             true);
+
+                        if (HPX_LIKELY(next_thrd == nullptr))
+                        {
+                            // schedule other work
+                            scheduler.wait_or_add_new(num_thread, running,
+                                idle_loop_count, enable_stealing_staged, added);
+                        }
+
                         scheduler.SchedulingPolicy::do_some_work(num_thread);
                     }
                     else if (HPX_UNLIKELY(state_val ==

I think the problem is that currently after the pending thread has been put back onto the (top of) queue the scheduler immediately pulls it back and retries without potentially waiting threads having a change of being work-requested. For other schedulers those will get stolen eventually, thus avoiding the live-lock.

hkaiser avatar Feb 18 '24 17:02 hkaiser

@hkaiser Doesn't seem to help much... keep in mind that this issue doesn't show up on the workrequesting scheduler when using the default queue, so it's either the mc queue or the stealhalf functionality that is causing the issue (I also tried setting this to false https://github.com/STEllAR-GROUP/hpx/blob/bd9e3cbabe3d8402263e9bafec6c8f6d5de6048a/libs/core/schedulers/include/hpx/schedulers/lockfree_queue_backends.hpp#L119 to see bulk dequeueing was the culprit, but it didn't fix things.

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Pansysk75 avatar Feb 19 '24 12:02 Pansysk75

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

hkaiser avatar Feb 19 '24 15:02 hkaiser

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

That's fixed now. Thanks again!

hkaiser avatar Feb 19 '24 16:02 hkaiser

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:white_check_mark: +0.04% :white_check_mark: 0.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (e977ecc639ae967fe8f104fb384a5a7a37a6ff52) 217975 185525 85.11%
Head commit (2e610e7a251bdf6955a85ee5ef9df03994e343c3) 190892 (-27083) 162547 (-22978) 85.15% (+0.04%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6428) 1 0 0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more

codacy-production[bot] avatar Feb 19 '24 21:02 codacy-production[bot]

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)--

Info

PropertyBeforeAfter
HPX Commitd27ac2efb17ed7a61d0417a1926ba8991f4f477f0e8959a9591ba78173df4105063007b7db08c35e
HPX Datetime2024-03-18T14:00:30+00:002024-04-28T16:28:37+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clusternamerostamrostam
Datetime2024-03-18T09:18:04.949759-05:002024-04-28T11:40:07.559807-05:00

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch(=)

Info

PropertyBeforeAfter
HPX Commitd27ac2efb17ed7a61d0417a1926ba8991f4f477f0e8959a9591ba78173df4105063007b7db08c35e
HPX Datetime2024-03-18T14:00:30+00:002024-04-28T16:28:37+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clusternamerostamrostam
Datetime2024-03-18T09:19:53.062988-05:002024-04-28T11:41:54.962060-05:00

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)(=)
Stream Benchmark - Scale(=)(=)(=)
Stream Benchmark - Triad=(=)(=)
Stream Benchmark - Copy(=)(=)(=)

Info

PropertyBeforeAfter
HPX Commitd27ac2efb17ed7a61d0417a1926ba8991f4f477f0e8959a9591ba78173df4105063007b7db08c35e
HPX Datetime2024-03-18T14:00:30+00:002024-04-28T16:28:37+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clusternamerostamrostam
Datetime2024-03-18T09:20:13.002391-05:002024-04-28T11:42:13.137113-05:00

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

StellarBot avatar Apr 28 '24 16:04 StellarBot