ompi
ompi copied to clipboard
UCC initialization fails at mca_coll_ucc_save_coll_handlers(), when under HAN
Hi, I'm seeing an initialization failure when combining coll/han with coll/ucc.
Environment:
Open MPI v5.0.x (#9605bed)
UCC master (#1d808c9)
Execution:
$(which mpirun) --host A:2,B:2 --mca coll libnbc,basic,ucc,han --mca pml ucx osu_allreduce
Leads to:
Error: coll_ucc_module.c:352 - mca_coll_ucc_module_enable() mca_coll_ucc_save_coll_handlers failed
I believe that the issue might stem from coll/ucc assuming that there will be a prior value/implementation present in comm->c_coll->coll_<primitive>, and errors out if one is not found.
I had also stumbled upon this question at some point and had more-or-less concluded that there are cases where a previous value won't be present -- Is this accurate? Or does the design say that there should always exist a previous value (eg. coll/base's?)?
Hi @gkatev, i just tried to reproduce that on the same v5.0.x rev and couldn't. Could you plz share the build cmd line of ompi and full cmd line (by default i believe both han/ucc are disabled).
Regarding general logic of coll/ucc saving backup handler. IIUC, the ompi coll_base logic loops through components from lowest prio to largest and each time overwrites ompi->c_coll->module and alg pointer. It starts with basic so that ALL pointers are initialized. So, it is not clear how the error above can happen unless ucc component is selected even before basic and libnbc (those 2 together are supposed to set ALL coll pointers, right?).
Yes sorry some other parameters were hidden in env vars. Here is the full command:
UCX_TLS=tcp,self $(which mpirun) -x PATH -x LD_LIBRARY_PATH --host tie3:2,tie4:2 --mca coll libnbc,basic,han,ucc --mca coll_han_priority 60 --mca coll_ucc_priority 50 --mca coll_ucc_enable 1 --mca pml ucx --mca pml_ucx_tls any osu_allreduce
(also happens with pml/ob1)
OMPI build
git clone -b v5.0.x https://github.com/open-mpi/ompi ompi-5.0.x-tie-git
cd ompi-5.0.x-tie-git
git submodule update --init --recursive
./autogen.pl
./configure --disable-man-pages --enable-mca-dso=coll --prefix=$HOME/openmpi-5.0.x-tie --with-pmix=internal --with-libevent=internal --with-ucx=$HOME/ucx --with-ucc=$HOME/ucc --with-xpmem=$HOME/xpmem |& tee config.out && make -sj24 && make -s install
Let me know if you can now reproduce it, or I could also take a look into the selection code myself.
I dug a bit into this, and it looks like it's related to the ompi_comm_coll_preference info key (@bosilca). Apply this helpful patch to observe the effect:
diff --git a/ompi/mca/coll/base/coll_base_comm_select.c b/ompi/mca/coll/base/coll_base_comm_select.c
index fcdb8649eb..fe10e19141 100644
--- a/ompi/mca/coll/base/coll_base_comm_select.c
+++ b/ompi/mca/coll/base/coll_base_comm_select.c
@@ -134,7 +134,19 @@ int mca_coll_base_comm_select(ompi_communicator_t * comm)
NULL != item; item = opal_list_remove_first(selectable)) {
mca_coll_base_avail_coll_t *avail = (mca_coll_base_avail_coll_t *) item;
-
+
+ if(ompi_comm_rank(MPI_COMM_WORLD) == 0) {
+ opal_cstring_t *info_str;
+ int flag = 0;
+
+ if(comm->super.s_info)
+ opal_info_get(comm->super.s_info, "ompi_comm_coll_preference", &info_str, &flag);
+
+ printf("COMM '%s' (%s) Calling module_enable() of %s, iallreduce fn %p\n",
+ comm->c_name, (flag ? info_str->string : "-"),
+ avail->ac_component_name, comm->c_coll->coll_iallreduce);
+ }
+
/* initialize the module */
ret = avail->ac_module->coll_module_enable(avail->ac_module, comm);
@@ -446,10 +458,24 @@ static opal_list_t *check_components(opal_list_t * components,
}
return NULL;
}
-
+
+ if(ompi_comm_rank(MPI_COMM_WORLD) == 0) {
+ printf("Before sort, selectable: ");
+ OPAL_LIST_FOREACH_DECL(item, selectable, mca_coll_base_avail_coll_t)
+ printf("%s ", item->ac_component_name);
+ printf("\n");
+ }
+
/* Put this list in priority order */
opal_list_sort(selectable, avail_coll_compare);
-
+
+ if(ompi_comm_rank(MPI_COMM_WORLD) == 0) {
+ printf("After sort, selectable: ");
+ OPAL_LIST_FOREACH_DECL(item, selectable, mca_coll_base_avail_coll_t)
+ printf("%s ", item->ac_component_name);
+ printf("\n");
+ }
+
/* For all valid component reorder them not on their provided priorities but on
* the order requested in the info key. As at this point the coll_include is
* already ordered backward we can simply append the components.
@@ -465,7 +491,14 @@ static opal_list_t *check_components(opal_list_t * components,
}
}
}
-
+
+ if(ompi_comm_rank(MPI_COMM_WORLD) == 0) {
+ printf("After info insert, selectable: ");
+ OPAL_LIST_FOREACH_DECL(item, selectable, mca_coll_base_avail_coll_t)
+ printf("%s ", item->ac_component_name);
+ printf("\n");
+ }
+
opal_argv_free(coll_argv);
if( NULL != coll_exclude ) {
free(coll_exclude);
[prterun-tie2-17964@1,0]<stdout>: Before sort, selectable: basic ucc libnbc
[prterun-tie2-17964@1,0]<stdout>: After sort, selectable: basic libnbc ucc
[prterun-tie2-17964@1,0]<stdout>: After info insert, selectable: basic ucc libnbc
libnbc is included in the info key, and therefore is inserted last into selectable, so as to get chosen
[prterun-tie2-17964@1,0]<stdout>: COMM 'MPI COMMUNICATOR 5 SPLIT FROM 0' (libnbc,^han) Calling module_enable() of basic, iallreduce fn (nil)
[prterun-tie2-17964@1,0]<stdout>: COMM 'MPI COMMUNICATOR 5 SPLIT FROM 0' (libnbc,^han) Calling module_enable() of ucc, iallreduce fn (nil)
[prterun-tie2-17964@1,0]<stderr>: [tie2:17983] Error: coll_ucc_module.c:400 - mca_coll_ucc_module_enable() mca_coll_ucc_save_coll_handlers failed
[prterun-tie2-17964@1,0]<stdout>: COMM 'MPI COMMUNICATOR 5 SPLIT FROM 0' (libnbc,^han) Calling module_enable() of libnbc, iallreduce fn (nil)
UCC's module_enable is called before libnbc's, which is why mca_coll_ucc_save_coll_handlers() fails.
Edit: I now realize HAN's sub-comm component selection does not work exactly as I thought, and running with libnbc,basic,ucc,han with the default HAN MCA parameter values might be hacky. Not sure if/how such a scenario could be improved, either from ucc's or han's or base's side.
This can also occur without HAN, in the case that a component has lower priority than basic or libnbc (ie. by default sm, adapt, han). Eg. with UCC, its default priority of 10 is the same as basic and libnbc, which results to a problematic module enable order of basic -> ucc -> libnbc (assuming _ucc_enable = 1). Of course, if a component wasn't gonna be selected anyway, it's maybe (?) not that big a deal if its initialization fails...
FYI @open-mpi/ucx
I gave this issue some more thought. The issue stems from the fact that when the module_enable function of a coll-component is called, it is not guaranteed that all coll-function pointers in the comm have been initialized by a previous component. This applies to all components that attempt to save these pointers in module_enable (e.g. coll/han), and not just coll/ucc.
-
This is likely to happen if a component has lower priority than libnbc or basic. Example selection order: ucc (priority 0) -> libnbc (priority 10) -> basic (priority 10).
-
It can also happen if a component like libnbc or basic is moved to the end of
selectablevia theompi_comm_coll_preferenceinfo key. Example order withompi_comm_coll_preference=libnbc,^han: basic -> ucc -> libnbc.
Case (1) is not really a problem; a component with priority lower than basic probably wouldn't end up getting used anyway. Case (2) can happen under HAN, but also in this case the component in question probably wouldn't get used as well, as the preferred component will be utilized instead.
Highlight!
For the matter at hand and with simplicity in mind , I suggest just changing the related UCC_ERROR("mca_coll_ucc_save_coll_handlers failed") to a UCC_VERBOSE, to silently fail and avoid confusing the user (read: me!). Though in scenarios other than (1) and (2), the more visible warning might still be desired...
Do go ahead and feel free to close this if the current state of things is OK, I believe that functionally this issue is not a problem.
Don't feel compelled to read below here :-)
For an actual solution, and with the fact that the pointers might not have been initialized, a straightforward choice in the context of a single collectives component, would be to lazily save these pointers not in module_enable but at some point in time after it. This would require a different approach to obtaining the pointers, and could be achieved by iterating comm->c_coll->module_list.
At some point I also experimented with adding a prev_module pointer to struct mca_coll_base_module_2_4_0_t, and adding some macros in coll/coll.h to call or to retrieve a fallback module to use when needed. This thought was towards providing the fallback functionality for all coll components to use, without problems like the one in this issue, and without the need for any book-keeping inside the component's code. It does have some quirks, let me know if you'd like me to elaborate.