ompi
ompi copied to clipboard
Update MCA mutexes to use the qthreads ULT backend
- This replaces the use of opal atomics with the use of qthread's atomics in the opal/thread MCA if ompi is configured with qthreads.
- This resolves the deadlock reported originating from expecting recursive locks to work (https://github.com/open-mpi/ompi/issues/10459).
- Requires Qthreads version >=1.18 as we added support for recursive locks in that version. I will issue a PR for the configury changes to version check.
Signed-off-by: Jan Ciesko [email protected]
Can one of the admins verify this patch?
ok to test
@janciesko Would you be able to rebase your branch on main
somewhere after
7dbfbeea - build: Use open-mpi/oac for oac submodule
commit? We're having an issue with the IBM CI when it tries to test a Pull Request that doesn't include that commit.
@janciesko And be sure to see https://www.mail-archive.com/[email protected]/msg21421.html
bot:ibm:retest
@gpaulsen, rebased.
/azp run
Azure Pipelines successfully started running 1 pipeline(s).
../../opal/mca/threads/pthreads/threads_pthreads_mutex.h: In function 'opal_thread_internal_mutex_lock':
../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:113:9: error: implicit declaration of function 'opal_show_help' [-Werror=implicit-function-declaration]
opal_show_help("help-opal-threads.txt", "mutex lock failed", true);
You need to #include "opal/util/show_help.h"
.
I think we do have that: https://github.com/open-mpi/ompi/blob/65e6591df99620bd52a77445e42338bbcfdacb79/opal/mca/threads/argobots/threads_argobots_mutex.h#L43 ~Is the from the Mellanox CI?~ Missing for pthreads. Updating.
Added missing include.
bot:aws:retest
bot:aws:retest
The failure from the Pull Request Build Checker is real, but caused by a bad commit to the main branch. If you push a rebase on top of main, it should go away.
Rebased to today's main. Thanks @bwbarrett.
is this good to go or waiting on something?
LGTM