Jan 31 '24 16:01 hkaiser

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	??	-

Info

Property	Before	After
HPX Datetime	2023-05-10T12:07:53+00:00	2024-01-31T16:49:40+00:00
HPX Commit	dcb541576898d370113946ba15fb58c20c8325b2	b7a31347bef0566702c5f042e69175a0d691aaeb
Datetime	2023-05-10T14:50:18.616050-05:00	2024-01-31T10:55:40.712102-06:00
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	(=)

Info

Property	Before	After
HPX Datetime	2023-05-10T12:07:53+00:00	2024-01-31T16:49:40+00:00
HPX Commit	dcb541576898d370113946ba15fb58c20c8325b2	b7a31347bef0566702c5f042e69175a0d691aaeb
Datetime	2023-05-10T14:52:35.047119-05:00	2024-01-31T10:57:55.614863-06:00
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	+	=	=
Stream Benchmark - Triad	(=)	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Datetime	2023-05-10T12:07:53+00:00	2024-01-31T16:49:40+00:00
HPX Commit	dcb541576898d370113946ba15fb58c20c8325b2	b7a31347bef0566702c5f042e69175a0d691aaeb
Datetime	2023-05-10T14:52:52.237641-05:00	2024-01-31T10:58:12.811457-06:00
Envfile
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Clustername	rostam	rostam

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

Jan 31 '24 16:01 StellarBot

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
:white_check_mark: -0.09%	:white_check_mark: ∅

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (9c53b99534f57cfa56336d835985faac18166fa1)	206733	176197	85.23%
Head commit (78626f0e56d365c03dece638afe29fcd310b371c)	190772 (-15961)	162415 (-13782)	85.14% (-0.09%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6428)	0	0	∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation}

Jan 31 '24 21:01 codacy-production[bot]

@Pansysk75 it looks like that the new scheduler doesn't properly work yet (as you suggested). the clang-12 and gcc-10 errors are most likely related.

Feb 05 '24 16:02 hkaiser

retest

Feb 07 '24 18:02 Pansysk75

@hkaiser FYI I'm working on this one, I've reproduced some failing tests (stop_token_cb1 being one of them). So far, I see that some pending hpx thread is never getting scheduled (either starvation or some bug). I'll let you know of what I find out.

Feb 16 '24 19:02 Pansysk75

@hkaiser It seems like the moodycamel queue provides FIFO guarantees only among items placed by the same thread, however there is no guarantee that items queued by different threads will be dequeued in a FIFO order.

From the moodycamel blog:

All elements from a given producer thread will necessarily still be seen in that same order relative to each other when dequeued (since the sub-queue preserves that order), albeit with elements from other sub-queues possibly interleaved.

And from the discussion in the comments:

gasche: ... If thread A enqueues 1, then does a write observed by thread B (synchronized through the appropriate memory fence), then B enqueues 2, then I would expect 1 to come before 2 -- and this is not racy. It seems to me that your data-structure is thus providing a weaker specification than linearizable queues. ...

Cameron: Yes, you're right. If there's external synchronization, then the final order is not related to the total order formed by the happens-before relationships of the enqueue operations themselves. ...

This currently causes trouble (deadlocks), as we rely on the FIFO ordering to avoid starvation: https://github.com/STEllAR-GROUP/hpx/blob/bd9e3cbabe3d8402263e9bafec6c8f6d5de6048a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp#L325-L332

In the fail-case that I managed to reproduce, a perpetually yielding thread starves out a shared-resource-holding thread when these two happen to be in the same moodycamel thread-queue. We cannot reliably insert the yielding thread in the"back" of the queue (no FIFO guarantee), and the queue under certain circumstances acts similar to a LIFO, thus never allowing the shared-resource-holding thread to be picked up by the scheduler and release the held resource.

Feb 17 '24 12:02 Pansysk75

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Feb 17 '24 14:02 hkaiser

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

Feb 17 '24 15:02 Pansysk75

Excellent detective work @Pansysk75! Is that true in case if we used the consumer/producer token API the queue provides as well?

Thanks! Here's what the implementer has to say about this (TLDR it's not possible): https://www.github.com/cameron314/concurrentqueue/issues/6

Would this patch mitigate the issue?

diff --git a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
index 1ca53941ee4..d2cc74fae4e 100644
--- a/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
+++ b/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp
@@ -315,13 +315,6 @@ namespace hpx::threads::detail {
                     if (HPX_UNLIKELY(
                             state_val == thread_schedule_state::pending))
                     {
-                        if (HPX_LIKELY(next_thrd == nullptr))
-                        {
-                            // schedule other work
-                            scheduler.wait_or_add_new(num_thread, running,
-                                idle_loop_count, enable_stealing_staged, added);
-                        }
-
                         // schedule this thread again, make sure it ends up at
                         // the end of the queue
                         scheduler.SchedulingPolicy::schedule_thread_last(
@@ -329,6 +322,14 @@ namespace hpx::threads::detail {
                             threads::thread_schedule_hint(
                                 static_cast<std::int16_t>(num_thread)),
                             true);
+
+                        if (HPX_LIKELY(next_thrd == nullptr))
+                        {
+                            // schedule other work
+                            scheduler.wait_or_add_new(num_thread, running,
+                                idle_loop_count, enable_stealing_staged, added);
+                        }
+
                         scheduler.SchedulingPolicy::do_some_work(num_thread);
                     }
                     else if (HPX_UNLIKELY(state_val ==

I think the problem is that currently after the pending thread has been put back onto the (top of) queue the scheduler immediately pulls it back and retries without potentially waiting threads having a change of being work-requested. For other schedulers those will get stolen eventually, thus avoiding the live-lock.

Feb 18 '24 17:02 hkaiser

@hkaiser Doesn't seem to help much... keep in mind that this issue doesn't show up on the workrequesting scheduler when using the default queue, so it's either the mc queue or the stealhalf functionality that is causing the issue (I also tried setting this to false https://github.com/STEllAR-GROUP/hpx/blob/bd9e3cbabe3d8402263e9bafec6c8f6d5de6048a/libs/core/schedulers/include/hpx/schedulers/lockfree_queue_backends.hpp#L119 to see bulk dequeueing was the culprit, but it didn't fix things.

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Feb 19 '24 12:02 Pansysk75

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

Feb 19 '24 15:02 hkaiser

Also, I think this fix is needed (or sth equivalent):

                task_data thrds(d.num_thread_);
-                thrds.tasks_.reserve(max_num_to_steal);


#ifdef HPX_HAVE_THREAD_STEALING_COUNTS
+              thrds.tasks_.reserve(max_num_to_steal);
                thread_id_ref_type thrd;
                while (max_num_to_steal-- != 0 &&
                    d.queue_->get_next_thread(thrd, false, true))
                {
                    d.queue_->increment_num_stolen_from_pending();
                    thrds.tasks_.push_back(HPX_MOVE(thrd));
                    thrd = thread_id_ref_type{};
                }
#else
+              thrds.tasks_.resize(max_num_to_steal);
                d.queue_->get_next_threads(
                    thrds.tasks_.begin(), thrds.tasks_.size(), false, true);

Uhh, how did it ever work? ;-)

That's fixed now. Thanks again!

Feb 19 '24 16:02 hkaiser

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
:white_check_mark: +0.04%	:white_check_mark: 0.00%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (e977ecc639ae967fe8f104fb384a5a7a37a6ff52)	217975	185525	85.11%
Head commit (2e610e7a251bdf6955a85ee5ef9df03994e343c3)	190892 (-27083)	162547 (-22978)	85.15% (+0.04%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6428)	1	0	0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more}

Feb 19 '24 21:02 codacy-production[bot]

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	-	-

Info

Property	Before	After
HPX Commit	d27ac2efb17ed7a61d0417a1926ba8991f4f477f	0e8959a9591ba78173df4105063007b7db08c35e
HPX Datetime	2024-03-18T14:00:30+00:00	2024-04-28T16:28:37+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clustername	rostam	rostam
Datetime	2024-03-18T09:18:04.949759-05:00	2024-04-28T11:40:07.559807-05:00

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	(=)

Info

Property	Before	After
HPX Commit	d27ac2efb17ed7a61d0417a1926ba8991f4f477f	0e8959a9591ba78173df4105063007b7db08c35e
HPX Datetime	2024-03-18T14:00:30+00:00	2024-04-28T16:28:37+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clustername	rostam	rostam
Datetime	2024-03-18T09:19:53.062988-05:00	2024-04-28T11:41:54.962060-05:00

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	(=)
Stream Benchmark - Scale	(=)	(=)	(=)
Stream Benchmark - Triad	=	(=)	(=)
Stream Benchmark - Copy	(=)	(=)	(=)

Info

Property	Before	After
HPX Commit	d27ac2efb17ed7a61d0417a1926ba8991f4f477f	0e8959a9591ba78173df4105063007b7db08c35e
HPX Datetime	2024-03-18T14:00:30+00:00	2024-04-28T16:28:37+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Compiler	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1	/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1
Envfile
Clustername	rostam	rostam
Datetime	2024-03-18T09:20:13.002391-05:00	2024-04-28T11:42:13.137113-05:00

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

Apr 28 '24 16:04 StellarBot

hpx
hpx copied to clipboard

Enable testing of new scheduler in CI

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

hpx hpx copied to clipboard

Enable testing of new scheduler in CI

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

hpx
hpx copied to clipboard