ompi icon indicating copy to clipboard operation
ompi copied to clipboard

pmix: remove the MCA framework

Open ggouaillardet opened this issue 1 year ago • 8 comments

pmix is now a first class citizen and hence do not need to be part of a MCA framework anymore

Refs. open-mpi/ompi#12282

ggouaillardet avatar Feb 05 '24 14:02 ggouaillardet

I am not fully happy with this PR. Since there is no more pmix framework, the pmix MCA parameters do not appear any more in ompi_info.

@jsquyres @bosilca could you please advise on how to move forward?

ggouaillardet avatar Feb 05 '24 14:02 ggouaillardet

Which params are you concerned about? I think there were a couple of params in the pmix/base - are those the ones?

If so, what I did in PRRTE was to add a param registration function in your opal/pmix/pmix.c and then just add that to the ompi_rte function. You can also add it to ompi_info so they display.

If you mean the PMIx internal params, then that's been an ongoing discussion. I forget the OMPI issue number, but basically we had agreed on a method but nobody had time to implement it.

rhc54 avatar Feb 05 '24 14:02 rhc54

The pmix MCA param would not make sense, and you already moved pmix_base_async_modex, pmix_base_collect_data and pmix_base_exchange_timeout. While leaves a single, somewhat important MCA param out, the pmix_base_verbose. But as we already have opal_pmix_verbose_output this shall not an issue.

bosilca avatar Feb 05 '24 16:02 bosilca

After this PR, the following bits have disappeared from ompi_info --all

My concern is if we cannot fix that, that could be a deal breaker for v5. I will investigate Ralph's suggestion.

            MCA pmix base: ---------------------------------------------------
            MCA pmix base: parameter "pmix" (current value: "", data source: default, level: 2 user/detail, type: string)
                           Default selection set of components for the pmix framework (<none> means use all components that can be found)
            MCA pmix base: ---------------------------------------------------
            MCA pmix base: parameter "pmix_base_verbose" (current value: "error", data source: default, level: 8 dev/detail, type: int)
                           Verbosity level for the pmix framework (default: 0)
                           Valid values: -1|none, 0|error, 10|component, 20|warn, 40|info, 60|trace, 80|debug, 100|max, 0-100
            MCA pmix base: parameter "pmix_base_async_modex" (current value: "false", data source: default, level: 9 dev/all, type: bool)
                           Use asynchronous modex mode
                           Valid values: 0|f|false|disabled|no|n, 1|t|true|enabled|yes|y
            MCA pmix base: parameter "pmix_base_collect_data" (current value: "true", data source: default, level: 9 dev/all, type: bool)
                           Collect all data during modex
                           Valid values: 0|f|false|disabled|no|n, 1|t|true|enabled|yes|y
            MCA pmix base: parameter "pmix_base_exchange_timeout" (current value: "-1", data source: default, level: 3 user/all, type: int)
                           Time (in seconds) to wait for a data exchange to complete

ggouaillardet avatar Feb 06 '24 06:02 ggouaillardet

@ggouaillardet Would you like me to fix the param issue for you? Having done it before, it should only take a few minutes to port it here.

rhc54 avatar Feb 08 '24 00:02 rhc54

I came up with this (plus a missing trivial fix)

diff --git a/opal/mca/common/ofi/common_ofi.c b/opal/mca/common/ofi/common_ofi.c
index d39412f..ed63c95 100644
--- a/opal/mca/common/ofi/common_ofi.c
+++ b/opal/mca/common/ofi/common_ofi.c
@@ -34,7 +34,6 @@
 #include "opal/mca/base/mca_base_var.h"
 #include "opal/mca/hwloc/base/base.h"
 #include "opal/mca/memory/base/base.h"
-#include "opal/mca/pmix/base/base.h"
 #include "opal/util/argv.h"
 #include "opal/util/show_help.h"
 
diff --git a/opal/runtime/opal_info_support.c b/opal/runtime/opal_info_support.c
index e167451..437163f 100644
--- a/opal/runtime/opal_info_support.c
+++ b/opal/runtime/opal_info_support.c
@@ -340,6 +340,7 @@ void opal_info_register_types(opal_pointer_array_t *mca_types)
     for (i = 0; NULL != opal_frameworks[i]; i++) {
         opal_pointer_array_add(mca_types, opal_frameworks[i]->framework_name);
     }
+    opal_pointer_array_add(mca_types, "pmix");
 }
 
 int opal_info_register_framework_params(opal_pointer_array_t *component_map)

If there is a better way, feel free to suggest or push in this PR directly.

I still believe we should find a way to keep pmix_base_verbose param, at least for the v5 series.

ggouaillardet avatar Feb 09 '24 07:02 ggouaillardet

Looks like this does the trick

diff --git a/opal/mca/common/ofi/common_ofi.c b/opal/mca/common/ofi/common_ofi.c
index d39412f..ed63c95 100644
--- a/opal/mca/common/ofi/common_ofi.c
+++ b/opal/mca/common/ofi/common_ofi.c
@@ -34,7 +34,6 @@
 #include "opal/mca/base/mca_base_var.h"
 #include "opal/mca/hwloc/base/base.h"
 #include "opal/mca/memory/base/base.h"
-#include "opal/mca/pmix/base/base.h"
 #include "opal/util/argv.h"
 #include "opal/util/show_help.h"
 
diff --git a/opal/pmix/pmix-internal.h b/opal/pmix/pmix-internal.h
index 1a8eefe..46d6d18 100644
--- a/opal/pmix/pmix-internal.h
+++ b/opal/pmix/pmix-internal.h
@@ -66,6 +66,7 @@ typedef struct {
 OPAL_DECLSPEC extern bool opal_pmix_collect_all_data;
 OPAL_DECLSPEC extern bool opal_pmix_base_async_modex;
 OPAL_DECLSPEC extern int opal_pmix_verbose_output;
+OPAL_DECLSPEC extern int opal_pmix_verbose;
 
 /* define a caddy for pointing to pmix_info_t that
  * are to be included in an answer */
diff --git a/opal/pmix/pmix.c b/opal/pmix/pmix.c
index 34900ca..c29fdc6 100644
--- a/opal/pmix/pmix.c
+++ b/opal/pmix/pmix.c
@@ -714,6 +714,7 @@ OBJ_CLASS_INSTANCE(opal_proclist_t, opal_list_item_t, NULL, NULL);
 
 bool opal_pmix_collect_all_data = true;
 int opal_pmix_verbose_output = -1;
+int opal_pmix_verbose = 0;
 bool opal_pmix_base_async_modex = false;
 opal_pmix_base_t opal_pmix_base = {.evbase = NULL,
                                    .timeout = 0,
diff --git a/opal/runtime/opal_info_support.c b/opal/runtime/opal_info_support.c
index e167451..437163f 100644
--- a/opal/runtime/opal_info_support.c
+++ b/opal/runtime/opal_info_support.c
@@ -340,6 +340,7 @@ void opal_info_register_types(opal_pointer_array_t *mca_types)
     for (i = 0; NULL != opal_frameworks[i]; i++) {
         opal_pointer_array_add(mca_types, opal_frameworks[i]->framework_name);
     }
+    opal_pointer_array_add(mca_types, "pmix");
 }
 
 int opal_info_register_framework_params(opal_pointer_array_t *component_map)
diff --git a/opal/runtime/opal_params.c b/opal/runtime/opal_params.c
index 5a9e7e2..ac05809 100644
--- a/opal/runtime/opal_params.c
+++ b/opal/runtime/opal_params.c
@@ -109,6 +109,18 @@ int opal_register_params(void)
                                  MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, OPAL_INFO_LVL_3,
                                  MCA_BASE_VAR_SCOPE_READONLY, &opal_pmix_base.timeout);
 
+    opal_pmix_verbose = 0;
+    (void) mca_base_var_register("opal", "pmix", "base", "verbose",
+                                 "Verbosity level for the pmix framework (default: 0)",
+                                 MCA_BASE_VAR_TYPE_INT,
+                                 &mca_base_var_enum_verbose, 0,
+                                 MCA_BASE_VAR_FLAG_SETTABLE, OPAL_INFO_LVL_8,
+                                 MCA_BASE_VAR_SCOPE_LOCAL,
+                                 &opal_pmix_verbose);
+    if (0 < opal_pmix_verbose) {
+        opal_pmix_verbose_output = opal_output_open(NULL);
+    }
+
     opal_finalize_register_cleanup(opal_deregister_params);
 
     return OPAL_SUCCESS;

ggouaillardet avatar Feb 09 '24 08:02 ggouaillardet

Yep, that's all there is to it for the params. However, I don't think this patch is entirely correct. You don't want to declare this still to be a framework by calling MCA_BASE_FRAMEWORK_DECLARE. I just added a simple pmix_base_init function and have opal_init call it to setup the "base" object and register the params.

rhc54 avatar Feb 09 '24 09:02 rhc54