ompi
ompi copied to clipboard
MPI_T Events
This is the work from PR8057 rebased after PR13086.
Feel free to comment any suggestions. I shall list the suggested changes from the old PR and ask for advice about their relevance at present.
Todo-list- Unresolved comments from the previous PR8057:
@hppritcha @jsquyres @hjelmn @cchambreau I'm tagging all of you for feedback. Please ignore if this is not relevant for you.
Has this PR been rebased on main? The mpi4py failure looks like it needs rebaing.
I rebased with main which involved resolving a few conflicts. That may be the reason. I'm checking this.
It looks like the singleton test fails in the mpi4py test. The branch seems up to date with master, minus #13126 that was just merged.
here's the traceback:
#0 0x00007ffff71ec82d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ffff71e5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2 0x00007ffff17c1c18 in opal_thread_internal_mutex_lock (p_mutex=0x7ffff1acd6e8 <mca_base_source_lock+40>) at ../../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:82
#3 0x00007ffff17c1ccc in opal_mutex_lock (mutex=0x7ffff1acd6c0 <mca_base_source_lock>) at ../../../opal/mca/threads/mutex.h:122
#4 0x00007ffff17c2325 in mca_base_source_get_by_name (name=0x7cabc0 "mca_base_default_source", source_out=0x7fffffff3330) at mca_base_source.c:165
#5 0x00007ffff17c2473 in mca_base_source_register (project=0x7ffff1885685 "opal", framework=0x7ffff1885681 "mca", component=0x7ffff188567c "base", name=0x7ffff188566d "default_source", description=0x7ffff188564f "Default source for MCA events",
ordered=true, source_time=0x7ffff17c1dd1 <mca_base_source_default_time_source>, source_ticks=1) at mca_base_source.c:202
#6 0x00007ffff17c1f77 in mca_base_source_init () at mca_base_source.c:76
#7 0x00007ffff17c3438 in mca_base_var_init () at mca_base_var.c:282
#8 0x00007ffff17c5bfe in register_variable (project_name=0x7ffff1889c0a "opal", framework_name=0x0, component_name=0x7ffff1889c05 "base", variable_name=0x7ffff1889bf6 "help_aggregate",
description=0x7ffff1889a98 "If opal_base_help_aggregate is true, duplicate help messages will be aggregated rather than displayed individually. This can be helpful for parallel jobs that experience multiple identical failures; "...,
type=MCA_BASE_VAR_TYPE_BOOL, enumerator=0x0, bind=0, flags=(unknown: 0), info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL, synonym_for=-1, storage=0x7ffff1ace1e4 <opal_help_want_aggregate>) at mca_base_var.c:1376
#9 0x00007ffff17c65c5 in mca_base_var_register (project_name=0x7ffff1889c0a "opal", framework_name=0x0, component_name=0x7ffff1889c05 "base", variable_name=0x7ffff1889bf6 "help_aggregate",
description=0x7ffff1889a98 "If opal_base_help_aggregate is true, duplicate help messages will be aggregated rather than displayed individually. This can be helpful for parallel jobs that experience multiple identical failures; "...,
type=MCA_BASE_VAR_TYPE_BOOL, enumerator=0x0, bind=0, flags=(unknown: 0), info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL, storage=0x7ffff1ace1e4 <opal_help_want_aggregate>) at mca_base_var.c:1546
#10 0x00007ffff17e10af in opal_show_help_init () at show_help.c:76
#11 0x00007ffff17afe43 in opal_init_util (pargc=0x0, pargv=0x0) at runtime/opal_init_core.c:481
#12 0x00007ffff1b8ff3d in ompi_mpi_instance_retain () at instance/instance.c:258
#13 0x00007ffff1b90401 in ompi_mpi_instance_init_common (argc=0, argv=0x0) at instance/instance.c:376
#14 0x00007ffff1b91714 in ompi_mpi_instance_init (ts_level=3, info=0x7ffff21f37a0 <ompi_mpi_info_null>, errhandler=0x7ffff21eb0e0 <ompi_mpi_errors_are_fatal>, instance=0x7ffff21fc300 <ompi_mpi_instance_default>, argc=0, argv=0x0)
at instance/instance.c:837
#15 0x00007ffff1b856cd in ompi_mpi_init (argc=0, argv=0x0, requested=3, provided=0x7fffffff42c8, reinit_ok=false) at runtime/ompi_mpi_init.c:359
#16 0x00007ffff1be6831 in PMPI_Init_thread (argc=0x0, argv=0x0, required=3, provided=0x7fffffff42c8) at init_thread.c:78
Thank you. Yes, I'm checking if removing opal-locks during registration helps or not. Such locks aren't used for pvars.
Removing OPAL_THREAD_[LOCK/UNLOCK](&mca_base_source_lock) calls at the following functions resolve this issue:
Similar functions such as mca_base_pvar_init() (link), mca_base_pvar_finalize() (link), and mca_base_pvar_register() (link) do not have such locks.
Could any of you please suggest any plausible effects I need check for before removing them? It seems to me that they aren't necessary.
The alternative is to unlock before calling mca_base_source_get_by_name(). Due to lack of familiarity, I need suggestion here.
Another option would be to use opal_recursive_mutex_t type mutex. probably removing the locking (which does seem buggy) would cause MPI_T to not support threadmultiple. I suspect this is the first time we're running our full git hub actions CI on this code.
There seems to already be an OPAL_THREAD_UNLOCK in mca_base_source_init before calling mca_base_source_register so the mutex should already be unlocked.
There is a valid reason why pvars do not need to be protected, they can only be created during the init step, and the performance variables will be alive for the entire duration of the application, so as long as opal_util_init/fini are called in a protected way the pvars are getting the same protection.
I don't know what a source event is so I cannot comment on that.
The problem is the lock is taken first in the call to mca_base_source_register which then makes a call to mca_base_source_get_by_name which then also tries to take the lock. As is clear from the tracebackabove.
It is possible the number of sources can increase during an application run (see section 15.3.8 of the MPI 4.1 standard), so there probably should be locking besides just in the source init and finalize methods (which was the only locking in the file in @hjelmn 's original PR).
I don't see a lock in mca_base_source_get_by_name but that might be due to the last push. Anyway, there is an unlock left in mca_base_source_get_by_name and that's definitively no good. Moreover, asmca_base_source_get_by_name is only called from mca_base_source_register with the lock help, there shall be no reason for mca_base_source_get_by_name to take it again.
Section 15.3.8 in the MPI standard states indeed that event sources can be added during the application execution, but in OMPI event sources are added during module initialization, and that happens in a single sequential context. Unless we are expecting sessions to be created from multiple threads and allow them to load different sets of modules I don't think we need any protection on the event creation path.
Looking more carefully at this code I see other issues with locking. Let's look at mca_base_source_get a read operation on a array where events cannot be removed (aka source events). It is however protected by the same mca_base_source_lock as the rest of the source events code. However, all usages of mca_base_source_get are also protected by the MPI_T lock (ompi_mpit_lock / ompi_mpit_unlock), so that code must be double safe. We never know !!!
At this point this PR looks very sketchy.
@bosilca
I don't see a lock in mca_base_source_get_by_name but that might be due to the last push. Anyway, there is an unlock left in mca_base_source_get_by_name and that's definitively no good. Moreover, asmca_base_source_get_by_name is only called from mca_base_source_register with the lock help, there shall be no reason for mca_base_source_get_by_name to take it again.
Indeed. Thank you! I removed the one that was left. It is a static function and called just from another place. I noted that in the comment.
Looking more carefully at this code I see other issues with locking. Let's look at mca_base_source_get a read operation on a array where events cannot be removed (aka source events). It is however protected by the same mca_base_source_lock as the rest of the source events code. However, all usages of mca_base_source_get are also protected by the MPI_T lock (ompi_mpit_lock / ompi_mpit_unlock), so that code must be double safe. We never know !!!
This was introduced in c9b35e2 after this comment.
The function mca_base_source_get() is called from the following places:
opal/mca/base/mca_base_source.c:129:mca_base_source_set_time_source()opal/mca/base/mca_base_event.c:260:mca_base_event_register()`ompi/mpi/tool/source_get_info.c:35:MPI_T_source_get_info()ompi/mpi/tool/source_get_timestamp.c:38:MPI_T_source_get_timestamp()
The second one (mca_base_event_register()) is not under a lock as far I can see. The first one, might as well be as it locks immediately afterwards.
The 3rd and the 4th one are indeed under different locks.
Oddly enough, mca_base_source_get() calls opal_pointer_array_get_item() which also has its own lock. So, I think the locks at mca_base_source_get() should indeed be removed.
Maybe @jsquyres also can comment here.
@kingshuk00 Any progress on this PR, perchance?
@jsquyres , @bosilca , @hppritcha ,
Two review comments from the original PR regarding datatype and the tools-events design are not addressed.
The respective discussions do not seem to have reached an accord. Please comment.
If my understanding is correct, aiming for 6.0 would be too ambitious and I am okay with this not getting in 6.0 if more redesign is needed. Thank you!
@kingshuk00 you are correct about the two unaddressed review comments. And i am of the opinion as well that this is too ambitious for the 6.0.x branch/release time frame. I am not sure continuing to work on this PR is the right approach - the basic design should be reconsidered.