ompi Info key not propagated during communicator creation

Hi, I'm seeing an issue when creating a communicator with MPI_Comm_dup_with_info. Even though I pass an info key, it is never delivered to my collective component (in the module_enable stage).

I inserted some code during communicator creation and it appears (to my unfamiliar-with-the-comm-creation-code eye) that the info is lost somewhere in the call to opal_infosubscribe_change_info.

To reproduce: (I'm on v5.0.0rc6)

#include <mpi.h>

int main(int argc, char **argv) {
	MPI_Init(NULL, NULL);
	
	MPI_Comm newcomm;
	MPI_Info info;
	
	MPI_Info_create(&info);
	MPI_Info_set(info, "ompi_comm_coll_preference", "^tuned");
	
	MPI_Comm_dup_with_info(MPI_COMM_WORLD, info, &newcomm);
	
	MPI_Info_free(&info);
	MPI_Comm_free(&newcomm);
	MPI_Finalize();

	return 0;
}

diff --git a/ompi/communicator/comm.c b/ompi/communicator/comm.c
index b49790d85f..af72c5f0df 100644
--- a/ompi/communicator/comm.c
+++ b/ompi/communicator/comm.c
@@ -1070,6 +1070,22 @@ int ompi_comm_dup_with_info ( ompi_communicator_t * comm, opal_info_t *info, omp
     if (info) {
         opal_infosubscribe_change_info(&newcomp->super, info);
     }
+    
+    {
+               opal_info_t *si = newcomp->super.s_info;
+               
+               opal_cstring_t *info_str;
+               int flag = 0;
+               
+               if(si) opal_info_get(si, "ompi_comm_coll_preference", &info_str, &flag);
+               else printf("Info no s_info\n");
+               
+               if(flag) {
+                       printf("Info val %s\n", info_str->string);
+                       OBJ_RELEASE(info_str);
+               } else
+                       printf("Info flag = 0\n");
+       }
 
     /* activate communicator and init coll-module */
     rc = ompi_comm_activate (&newcomp, comm, NULL, NULL, NULL, false, mode);
diff --git a/opal/util/info_subscriber.c b/opal/util/info_subscriber.c
index 68dc7ef087..92e531baf9 100644
--- a/opal/util/info_subscriber.c
+++ b/opal/util/info_subscriber.c
@@ -130,8 +130,13 @@ static const char *opal_infosubscribe_inform_subscribers(opal_infosubscriber_t *
      * Since multiple subscribers could set values, only the last setting is kept as the
      * returned value.
      */
+    
+    printf("table = %p\n", table);
+    
     if (table) {
         opal_hash_table_get_value_ptr(table, key, strlen(key), (void **) &list);
+        
+        printf("list = %p\n", list);
 
         if (list) {
             updated_value = new_value;
@@ -265,10 +270,15 @@ int opal_infosubscribe_change_info(opal_infosubscriber_t *object, opal_info_t *n
             OBJ_RETAIN(value_str);
             key_str = iterator->ie_key;
             OBJ_RETAIN(key_str);
+            
+            printf("info key str %s\n", key_str->string);
 
             updated_value = opal_infosubscribe_inform_subscribers(object, iterator->ie_key->string,
                                                                   iterator->ie_value->string,
                                                                   &found_callback);
+            
+            printf("updated = %p, found_callback = %d\n", updated_value, found_callback);
+            
             if (NULL != updated_value
                 && 0 != strncmp(updated_value, value_str->string, value_str->length)) {
                 err = opal_info_set(object->s_info, iterator->ie_key->string, updated_value);

The output of the snippet:

info key str ompi_comm_coll_preference
table = 0x560363c9e738
list = (nil)
updated = (nil), found_callback = 0
Info flag = 0

From what I understand, the Info flag here should be 1?

Edit: Maybe this change reflects the initially intended code?

diff --git a/ompi/communicator/comm.c b/ompi/communicator/comm.c
index b49790d85f..aff12bd490 100644
--- a/ompi/communicator/comm.c
+++ b/ompi/communicator/comm.c
@@ -963,6 +963,7 @@ int ompi_comm_split_type (ompi_communicator_t *comm, int split_type, int key,
         ompi_comm_assert_subscribe (newcomp, OMPI_COMM_ASSERT_LAZY_BARRIER);
         ompi_comm_assert_subscribe (newcomp, OMPI_COMM_ASSERT_ACTIVE_POLL);
         if (info) {
+            opal_info_dup(info, &newcomp->super.s_info);
             opal_infosubscribe_change_info(&newcomp->super, info);
         }
 
@@ -1068,6 +1069,7 @@ int ompi_comm_dup_with_info ( ompi_communicator_t * comm, opal_info_t *info, omp
     ompi_comm_assert_subscribe (newcomp, OMPI_COMM_ASSERT_LAZY_BARRIER);
     ompi_comm_assert_subscribe (newcomp, OMPI_COMM_ASSERT_ACTIVE_POLL);
     if (info) {
+        opal_info_dup(info, &newcomp->super.s_info);
         opal_infosubscribe_change_info(&newcomp->super, info);
     }

Apr 28 '22 16:04 gkatev

Too much smartness kill all smartness. The issue here is two folds: the MPI standard require all unknown keys to be removed, and our communicator creation and initialization is done in an asynchronous way. Thus, by the time we call opal_infosubscribe_change_info to notify all info generators that we want to add infos to the communicator, the communicator is not fully created (we have the shell but no content, not even a cid). As a result all info is ignored.

Now, if one would have hoped to reshape the communicator creation via the info keys, that would be broken too, because there is no info attached to the communicator during the initialization.

Apr 29 '22 14:04 bosilca

@gkatev Sorry for the late reply, this slipped under my radar. As @bosilca mentioned, MPI (§7.4.4) requires us to drop any info keys that we don't know about (to provide feedback to the user on whether the info key set was understood by the implementation). The removal is done in opal_info_remove_unreferenced at the end of ompi_comm_dup_with_info for all info keys that haven't been read at least once during communicator creation. There is opal_info_mark_referenced to mark a key as known even though it hasn't actually been used yet. I realize that this is not great because we have to retain a key aimed at collectives a code path that is not related to collectives. Will talk to @bosilca off-line about this and come back here.

May 25 '22 13:05 devreal

Hi @devreal, I'm aware of the removal of unreferenced info keys. I've dealt with it by making sure to read any keys in module activation. The scenario here takes place before opal_info_remove_unreferenced is called, in the path triggered by ompi_comm_activate. Trying to read the info in the module enable stage fails.

From what I understand from @bosilca's description, opal_infosubscribe_change_info is supposed to add the keys, but this is not properly timed? If it is only supposed to change the keys' values, then something might be missing to add the keys to the communicator in the first place?

My patch at the end of the original post is based on the assumption that opal_infosubscribe_change_info only changes values, and a opal_info_dup call was meant to be there before it. It works in the tests I've performed (which are by no means exhaustive!). Relevant diff: comm.c diff, line 1042. But I assume that the issue is more complex and there's more at play here.

May 25 '22 13:05 gkatev

I see, thanks for the clarification. opal_infosubscribe_change_info looks for subscribers for each key in the info object and invokes the subscriber callback. If there are no subscribers for a particular key, the info will be dropped. ompi_comm_assert_subscribe subscribes to selected communicator keys (two lines above the call to opal_infosubscribe_change_info). The call to opal_info_dup you added forces all info keys to be copied and makes the call to opal_infosubscribe_change_info unnecessary. It might be worth changing that, since unused keys are removed at the end of the call to ompi_comm_dup_with_info. Alternatively, subscribing to ompi_comm_coll_preference should work, too.

May 25 '22 15:05 devreal

These are bandaid solutions, tailored for a specific case. The real problem is the info keys are attached in the middle of the process, too late to be used during initialization and too late to be correctly validated on a well formed communicator. Thus, what we have right now need to be revisited in order to make a sensible process.

It looks like we need a 2 phase process:

we associate the original infos (inherited from the parent, and/or passed from the communicator creation) with the communicator upfront, such that this info is available during module query. We do not validate the info just yet.
For the info validation we also have two choices. Initially, I thought about validating the info in the communicator completion callback, once it is correctly formed and usable for communications. However, doing so will prevent any lazy initialization, a feature desirable for many frameworks, especially collectives. Thus, for frameworks with lazy initialization it would make sense to delays the info key validation until we really need to return the proper answer to the user, aka. the user ask for them.

May 25 '22 15:05 bosilca

So, perform the info validation lazily, at MPI_Info_get-time? But this could still happen before a component is lazily initialized.

My current solution is to utilize the info at coll-module-enabe-time, or cache the value to be used in the lazy initialization.

Another approach could be for a component to explicitely specify during comm creation which info keys it will/might require, so those won't be removed. Something like this could be realized as a list of keys in the component or module struct.

Evolving the idea, in the context of collectives (is all this also a problem with non-coll components?), the module that comm_query returns to coll/base could contain a list of info keys to retain, and base would end up retaining them if the module actually gets enabled.

May 25 '22 16:05 gkatev

We don't have to remove (yet) unused keys in MPI_Info_get, just hide them. That way, if they are referenced later they will appear later. Still not great, as two calls to MPI_Info_get might yield different outcomes...

May 25 '22 18:05 devreal

ompi ompi copied to clipboard

Info key not propagated during communicator creation

ompi
ompi copied to clipboard