hpx icon indicating copy to clipboard operation
hpx copied to clipboard

Shared priority scheduler does not always create a new thread

Open msimberg opened this issue 4 years ago • 7 comments

While working on #5172 the shutdown_suspended_pus test started failing more regularly. The failure was this assertion firing: https://github.com/STEllAR-GROUP/hpx/blob/ab865a7781c24674581cf8c299b63f1bb3e9bdca/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp#L452. The scheduling loop expects run_now = true to mean that a thread object will definitely be created. The shared priority queue scheduler has a fallback where if select_active_pu does not return the same thread as the given hint it will not create a thread, but just a thread description. This can happen basically only when suspension is involved.

I think either the shared priority queue scheduler should respect the run_now argument, or there needs to be a separate flag to force a thread to be created. This may also explain some other failures with the shared_priority_queue_scheduler on the suspension tests (many of which are disabled because of failures).

@biddisco

msimberg avatar Feb 19 '21 17:02 msimberg

It depends on what you think the scheduling loop should be doing/requesting. I would like ro remove most of the functionality from the scheduling loop and instead allow schedulers to decide how they manage things. Currently the scheduling loop calls functions that steal, delete, create tasks and this forces a certain api onto the schedulers that I think is wrong. Evidence at hand is that for the fork join scheduler, all of this is bypassed and instead another scheduling loop is run. HPX_ASSERT(background_thread); is really another thing. The background task used to be run on the OS thread, but Thomas added support to instead run it on an hpx worker thread - this was partly because of problems inside the parcelport when adding verbs/libfabric support, but in fact it isn't necessary any more since we don't suspend or block inside the network processing. There is not really any need to do this, though it is an interesting feature (which might prove useful when considering continuations on mpi or gpu futures that are currently polled on the OS thread). The obvious thing to do is simply put the background processing thread into the same category as those other polling helpers.

Why does the shared priority scheduler cause problems with the suspension tests? I was not aware of this issue, perhaps I can help solve that...

biddisco avatar Feb 19 '21 18:02 biddisco

Why does the shared priority scheduler cause problems with the suspension tests? I was not aware of this issue, perhaps I can help solve that...

You tell me ;) This is one of the reasons, apparently. Let me know if my description of the problem wasn't clear enough. I thought there were more of them, but it looks like there's only one other place where the shared priority scheduler is disabled, in suspend_thread_timed: https://github.com/STEllAR-GROUP/hpx/blob/ab865a7781c24674581cf8c299b63f1bb3e9bdca/tests/unit/resource/suspend_thread_timed.cpp#L157-L158. I don't know if it's in any way related though.

It depends on what you think the scheduling loop should be doing/requesting. I would like ro remove most of the functionality from the scheduling loop and instead allow schedulers to decide how they manage things. Currently the scheduling loop calls functions that steal, delete, create tasks and this forces a certain api onto the schedulers that I think is wrong. Evidence at hand is that for the fork join scheduler, all of this is bypassed and instead another scheduling loop is run. HPX_ASSERT(background_thread); is really another thing. The background task used to be run on the OS thread, but Thomas added support to instead run it on an hpx worker thread - this was partly because of problems inside the parcelport when adding verbs/libfabric support, but in fact it isn't necessary any more since we don't suspend or block inside the network processing. There is not really any need to do this, though it is an interesting feature (which might prove useful when considering continuations on mpi or gpu futures that are currently polled on the OS thread). The obvious thing to do is simply put the background processing thread into the same category as those other polling helpers.

As for all this, I have no particular objections to all this but they are obviously bigger changes. I don't even personally consider the suspension tests particularly important, so please feel free to ignore this as well. I just wanted to make you aware of this problem at least.

msimberg avatar Feb 19 '21 19:02 msimberg

I meant to add: the fact that the shared priority scheduler sometimes ignores to return a thread object is related to suspension, but that doesn't only affect the background thread. Things like hpx::thread also rely on actually getting a thread id when spawning work.

msimberg avatar Feb 19 '21 19:02 msimberg

I'll take a look at this once I'm done with my current work.

biddisco avatar Feb 22 '21 13:02 biddisco

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 24 '21 13:08 stale[bot]

@biddisco I'm leaving this open in case you get a chance to look at it. Feel free to close if you're not planning on fixing this.

msimberg avatar Aug 25 '21 07:08 msimberg

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '22 11:04 stale[bot]