ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Crash in with Open MPI v4.1.x in MPI_Win_lock_all when used with libfabric < 1.12

Open Flamefire opened this issue 3 years ago • 32 comments

Background information

We were using OpenMPI 4.0.5 with libfabric 1.11.0 for MPI one-sided communication. When upgrading to OpenMPI 4.1 MPI_Win_lock_all crashes.

Using libfabric 1.12.x works. However as OMPI 4.0 works with libfabric 1.11 this rather looks like an OMPI bug and upgrading libfabric may not be easily possible

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.0.x (specifically 4.0.5) works, 4.1.0 & 4.1.1 crashes

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Source

Please describe the system on which you are running

  • Operating system/version: RHEL 7.9
  • Computer hardware: Intel x86
  • Network type: Infiniband

Details of the problem

The following program crashes when compiled and run with mpirun over more than 1 node: test_mpi2.cpp.txt

Output:

[taurusi6584:18594:0:18594] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x40)
==== backtrace (tid:  18594) ====
 0 0x00000000000234f3 ucs_debug_print_backtrace()  /dev/shm/easybuild-build/UCX/1.9.0/GCCcore-10.2.0/ucx-1.9.0/src/ucs/debug/debug.c:656
 1 0x00000000000102e1 rxd_start_xfer.cold()  rxd_cq.c:0
 2 0x0000000000073a16 rxd_progress_tx_list()  crtstuff.c:0
 3 0x000000000007547b rxd_handle_recv_comp()  crtstuff.c:0
 4 0x00000000000781a5 rxd_ep_progress()  crtstuff.c:0
 5 0x000000000002ed3d ofi_cq_progress()  crtstuff.c:0
 6 0x000000000002e09e ofi_cq_readfrom()  crtstuff.c:0
 7 0x0000000000006da7 mca_btl_ofi_context_progress()  ???:0
 8 0x0000000000003e8e mca_btl_ofi_component_progress()  btl_ofi_component.c:0
 9 0x00000000000313ab opal_progress()  ???:0
10 0x0000000000018335 ompi_osc_rdma_lock_all_atomic()  ???:0
11 0x000000000009b203 MPI_Win_lock_all()  ???:0
12 0x00000000004017e1 main()  ???:0
13 0x0000000000022555 __libc_start_main()  ???:0
14 0x0000000000401529 _start()  ???:0
=================================
[taurusi6584:18594] *** Process received signal ***
[taurusi6584:18594] Signal: Segmentation fault (11)
[taurusi6584:18594] Signal code:  (-6)
[taurusi6584:18594] Failing at address: 0xf51cf000048a2
[taurusi6584:18594] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b3dcca1c630]
[taurusi6584:18594] [ 1] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0/lib/libfabric.so.1(+0x102e1)[0x2b3dcdad92e1]
[taurusi6584:18594] [ 2] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0/lib/libfabric.so.1(+0x73a16)[0x2b3dcdb3ca16]
[taurusi6584:18594] [ 3] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0/lib/libfabric.so.1(+0x7547b)[0x2b3dcdb3e47b]
[taurusi6584:18594] [ 4] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0/lib/libfabric.so.1(+0x781a5)[0x2b3dcdb411a5]
[taurusi6584:18594] [ 5] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0/lib/libfabric.so.1(+0x2ed3d)[0x2b3dcdaf7d3d]
[taurusi6584:18594] [ 6] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0/lib/libfabric.so.1(+0x2e09e)[0x2b3dcdaf709e]
[taurusi6584:18594] [ 7] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/OpenMPI/4.1.1-GCC-10.2.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_progress+0x57)[0x2b3dcdabfda7]
[taurusi6584:18594] [ 8] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/OpenMPI/4.1.1-GCC-10.2.0/lib/openmpi/mca_btl_ofi.so(+0x3e8e)[0x2b3dcdabce8e]
[taurusi6584:18594] [ 9] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/OpenMPI/4.1.1-GCC-10.2.0/lib/libopen-pal.so.40(opal_progress+0x2b)[0x2b3dcc1693ab]
[taurusi6584:18594] [10] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/OpenMPI/4.1.1-GCC-10.2.0/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_lock_all_atomic+0x335)[0x2b3dcfd69335]
[taurusi6584:18594] [11] /beegfs/global0/ws/s3248973-easybuild/openMPINew/software/OpenMPI/4.1.1-GCC-10.2.0/lib/libmpi.so.40(PMPI_Win_lock_all+0xb3)[0x2b3dcb661203]
[taurusi6584:18594] [12] a2.out[0x4017e1]
[taurusi6584:18594] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3dcbcd0555]
[taurusi6584:18594] [14] a2.out[0x401529]
[taurusi6584:18594] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 18594 on node taurusi6584 exited on signal 11 (Segmentation fault).
  Configure command line: '--build=x86_64-pc-linux-gnu'
                          '--host=x86_64-pc-linux-gnu' '--with-slurm'
                          '--with-pmi=/usr' '--with-pmi-libdir=/usr/lib64'
                          '--with-knem=/opt/knem-1.1.3.90mlnx1'
                          '--enable-mpirun-prefix-by-default'
                          '--enable-shared' '--with-cuda=no'
                          '--with-hwloc=/sw/installed/hwloc/2.2.0-GCCcore-10.2.0'
                          '--with-libevent=/sw/installed/libevent/2.1.12-GCCcore-10.2.0'
                          '--with-ofi=/beegfs/global0/ws/s3248973-easybuild/openMPINew/software/libfabric/1.11.0-GCCcore-10.2.0'
                          '--with-pmix=/sw/installed/PMIx/3.1.5-GCCcore-10.2.0'
                          '--with-ucx=/sw/installed/UCX/1.9.0-GCCcore-10.2.0'
                          '--without-verbs'

Flamefire avatar Jul 06 '21 13:07 Flamefire

If you have an infiniband network, is there a reason you're not using the UCX PML?

jsquyres avatar Jul 06 '21 16:07 jsquyres

As you can see I build with ucx and ofi. So OMPI can choose, I guess.

Flamefire avatar Jul 06 '21 16:07 Flamefire

@Flamefire seems like UCX is not selected by default . Can you please run with -mca pml_base_verbose 100 -mca pml_ucx_verbose 100 (args from mpirun) and upload the output? Also, can you try by adding -mca pml ucx for force UCX selection?

yosefe avatar Jul 06 '21 16:07 yosefe

The reason we build with libfabric and UCX (as far as I remember) is that the compilation process should be universal so the built package can be used on different systems. In particular this is built with EasyBuild via a "recipe" and run on different HPC systems worldwide. Hence "just use UCX" is not a solution but only a workaround if at all. So far for context.

The requested logs for without and with -mca pml ucx(both crash): mpicrash.txt mpicrash_ucx.txt

Flamefire avatar Jul 07 '21 08:07 Flamefire

@Flamefire OFI (with osc/rdma layer) is being selected by default since it has higher priority (=101) than osc/ucx (=70). Can you please add also "-mca osc ucx" to the command line?

@jsquyres on a general note, IMO libfabric components should reduce their priority when running on Mellanox hardware, so that it will be lower than osc/ucx. This is same as UCX components reducing their priority when NOT running on Mellanox hardware. WDYT?

yosefe avatar Jul 07 '21 16:07 yosefe

@yosefe If the OFI component is being selected incorrectly, then yes, I agree that that is a problem. I don't think it should look for NVIDIA hardware specifically and demote itself in that case, though -- that would be weird (i.e., look for and react to hardware that it wouldn't otherwise look for).

It's quite possible that we're not testing "full build" scenarios well enough (i.e., builds with as many communication libraries as possible, to include Libfabric, UCX, ... etc.):

  1. In situations with only OFI-enabled hardware
  2. In situations with only NVIDIA-enabled hardware
  3. In situations with neither OFI- nor NVIDIA-enabled hardware
  4. In situations with both OFI- and NVIDIA-enabled hardware

jsquyres avatar Jul 07 '21 17:07 jsquyres

I don't think it should look for NVIDIA hardware specifically and demote itself in that case, though

So should OFI check for OFI-specific hardware?

It's quite possible that we're not testing "full build" scenarios well enough (i.e., builds with as many communication libraries as possible, to include Libfabric, UCX, ... etc.):

For case (4), IMO the user should select explicitly (via -mca param) to get the right library to work. And for case (3), probably neither OFI nor UCX should be high priority.

yosefe avatar Jul 07 '21 18:07 yosefe

So should OFI check for OFI-specific hardware?

I'm not sure what the right answer is here. I agree that if there's only NVIDIA hardware and UCX is available, the UCX components should be auto-selected over the OFI components. I'm not sure of the exact mechanism.

FYI: @open-mpi/ofi this requires some discussion. See the original description, above, for version/environment information.

jsquyres avatar Jul 07 '21 18:07 jsquyres

FYI: @open-mpi/ofi this requires some discussion. See the original description, above, for version/environment information.

Sounds like a great topic for the proposed developer's meetings, yes? https://github.com/open-mpi/ompi/wiki/Meeting-2021-07

rhc54 avatar Jul 07 '21 18:07 rhc54

ucs_debug_print_backtrace() means that the crash occurred inside the UCX library itself. I would really dislike UCX being the default because it already mistakes OPA hardware for Mellanox hardware and generates warning or errors on OPA fabrics if it isn't disabled.

mwheinz avatar Jul 07 '21 18:07 mwheinz

OMPI v4.1 added in OFI BTL support (used to just be only MTL in v4.0). If for whatever reason you did not need ofi btl, then the adding this can disable it: --mca btl ^ofi

Please rerun and capture output with ofi logging enabled: -x FI_LOG_LEVEL=info This appears to be a failure in rxd that is getting caught by a signal handler in UCX (I think)

Some ofi providers layer what are called util providers (ofi_rxd, ofi_rxm, etc.) on top of core providers. There is no easy way to disable this 'feature' (if you wanted) other than excluding all providers but the one you want:

MY_PROVIDER=psm3
FI_PROVIDER="^$(fi_info | sed -n 's/provider: //p'  | sort -u | grep -v ^${MY_PROVIDER}$ | tr '\n' ',' | sed 's/,$//')"

mpirun ... -x FI_PROVIDER=^UDP,UDP;ofi_rxd,psm2,psm2;ofi_rxd,psm3;ofi_rxd,shm,sockets,tcp,tcp;ofi_rxm,verbs,verbs;ofi_rxd,verbs;ofi_rxm  ./my_mpi_app

Note: psm3 is an example as this is what I work on :). It was also added to libfabric v1.12.x

Notice above psm3;ofi_rxd is there but psm3 was the only one not excluded.

acgoldma avatar Jul 07 '21 19:07 acgoldma

ucs_debug_print_backtrace() means that the crash occurred inside the UCX library itself. I would really dislike UCX being the default because it already mistakes OPA hardware for Mellanox hardware and generates warning or errors on OPA fabrics if it isn't disabled.

The error happened at 1 0x00000000000102e1 rxd_start_xfer.cold() rxd_cq.c:0; ucs_debug_print_backtrace is merely the signal handler. Anyway, latest versions of UCX do not install SIGSEGV handler by default. Also, we changed pml/ucx to reduce UCX priority if no "mlnx" hardware is found, so it should not be selected on OPA fabrics. If you observe otherwise, pls LMK and we will fix it.

yosefe avatar Jul 07 '21 20:07 yosefe

ucs_debug_print_backtrace() means that the crash occurred inside the UCX library itself

I do not think that that is correct. The log says:

...
[taurusa4:03415] pml_ucx.c:182 Got proc 10 address, size 360
[taurusa4:03415] pml_ucx.c:411 connecting to proc. 10
[taurusa4:03419] pml_ucx.c:182 Got proc 14 address, size 360
[taurusa4:03419] pml_ucx.c:411 connecting to proc. 14
[taurusa4:03416] pml_ucx.c:182 Got proc 11 address, size 360
[taurusa4:03416] pml_ucx.c:411 connecting to proc. 11
[taurusa4:03417] pml_ucx.c:411 connecting to proc. 12
...
[taurusa4][[44612,1],7][btl_ofi_context.c:443:mca_btl_ofi_context_progress] fi_cq_readerr: (provider err_code = 0)

[taurusa4][[44612,1],7][btl_ofi_component.c:238:mca_btl_ofi_exit] BTL OFI will now abort.
[taurusa4][[44612,1],6][btl_ofi_context.c:443:mca_btl_ofi_context_progress] fi_cq_readerr: (provider err_code = 0)

[taurusa4][[44612,1],6][btl_ofi_component.c:238:mca_btl_ofi_exit] BTL OFI will now abort.
[taurusa4:3413 :0:3413] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x40)
[taurusa4][[44612,1],4][btl_ofi_context.c:443:mca_btl_ofi_context_progress] fi_cq_readerr: (provider err_code = 0)

[taurusa4][[44612,1],4][btl_ofi_component.c:238:mca_btl_ofi_exit] BTL OFI will now abort.
==== backtrace (tid:   3413) ====
 0 0x00000000000234f3 ucs_debug_print_backtrace()  /dev/shm/easybuild-build/UCX/1.9.0/GCCcore-10.2.0/ucx-1.9.0/src/ucs/debug/debug.c:656
 1 0x0000000000073678 rxd_start_xfer()  crtstuff.c:0
 2 0x0000000000073a16 rxd_progress_tx_list()  crtstuff.c:0
 3 0x000000000007547b rxd_handle_recv_comp()  crtstuff.c:0
 4 0x00000000000781a5 rxd_ep_progress()  crtstuff.c:0
 5 0x000000000002ed3d ofi_cq_progress()  crtstuff.c:0
 6 0x000000000002e09e ofi_cq_readfrom()  crtstuff.c:0
...

Meaning:

  1. It looks like the UCX PML was being used (or at least it was emitting a lot of output, implying that it was at least initializing).
  2. The OFI BTL said it was going to abort. So I assume that is the entity that actually aborted.

Given that ob1 was not used, it might be useful to run with --mca osc_base_verbose 100 to see if / why the OFI BTL was being used for OSC instead of UCX. That seems to be where the issue is occurring.

jsquyres avatar Jul 07 '21 21:07 jsquyres

@Flamefire could you point us to a reproducer test?

hppritcha avatar Jul 07 '21 21:07 hppritcha

@Flamefire OFI (with osc/rdma layer) is being selected by default since it has higher priority (=101) than osc/ucx (=70). Can you please add also "-mca osc ucx" to the command line?

That runs through: mpirun -mca pml_base_verbose 100 -mca pml_ucx_verbose 100 -mca pml ucx -mca osc ucx a.out : mpirun_osc_ucx.log

@hppritcha I though I did, but Github doesn't allow cpp as extension so I uploaded the wrong file. Updated the description. See https://github.com/open-mpi/ompi/files/6782369/test_mpi2.cpp.txt

@acgoldma

Please rerun and capture output with ofi logging enabled: -x FI_LOG_LEVEL=info

Crash from mpirun -mca pml_base_verbose 100 -mca pml_ucx_verbose 100 -x FI_LOG_LEVEL=info a.out: mpicrash_fi_log.log

This appears to be a failure in rxd that is getting caught by a signal handler in UCX (I think)

You are correct. This is also what GDB confirms. Real crash is in libfabric. IIRC from my GDB experiences this line here crashes: https://github.com/ofiwg/libfabric/blob/v1.11.2/prov/rxd/src/rxd_cq.c#L244 due to rxd_peer(ep, tx_entry->peer) returning NULL

Also: While I appreciate the discussion this has sparked, I don't think changing priorities is a real solution here (although it might help for e.g. performance reasons) I mean: It does crash. And no matter which (valid) backend is chosen, it shouldn't ever crash, should it? And in any case: Our users can't be bothered with adding mca params, especially as they usually just use SLURM (srun foo.out), so it should work (well) out of the box

I'm in a good position to run more tests if required. I could do combinations of OMPI 4.0.5/4.1.0 with libfabric 1.11.2/1.12.0 and could build those with debug symbols included for better backtraces. However regarding MPI I'm basically just a user, so if you need any logs, please tell me the exact parameters I should use.

Flamefire avatar Jul 08 '21 07:07 Flamefire

@Flamefire my 0.02 US$

  • per your report, Open MPI 4.1 works with libfabric 1.12, and Open MPI 4.0 works with libfabric 1.11. Based on this data point, my first intuition is that Open MPI 4.1 does things differently and hit a libfabric 1.11 bug that has been fixed in 1.12.
  • performance wise, UCX should be used instead of libfabric, and I believe this is what should happen out of the box. meanwhile, this behavior has to be manually forced.
  • MCA parameters can be passed via the mpirun command line, but also via the system wide /.../etc/openmpi-mca-params.conf, the user ~/.openmpi/mca-params.conf or the OMPI_MCA_* environment variables. If --mca ... is obviously not an option with srun, the other ones work just fine (and some can even be implemented by the sysadmins without endusers even being aware of that).

Sure, Open MPI 4.1 with the (not latest) libfabric 1.11 (1.12 works, but libfabric should not be used anyway for performance reasons) should not crash. That being said, is it worth spending time investigating this scenario? IMNSHO no.

To me, the real issue is instead with the default priorities that ends up choosing libfabric instead of ucx (at least for one sided communications). Since Open MPI 4.1 is now known to crash with libfabric 1.11, we could also bump the libfabric requirements to 1.12.

ggouaillardet avatar Jul 08 '21 08:07 ggouaillardet

@ggouaillardet So do I understand this correctly that a workaround would be to either exclude libfabric from the build or add a line osc_rdma_priority=50 to /etc/openmpi-mca-params.conf?

Flamefire avatar Jul 08 '21 08:07 Flamefire

Based on the messages, I believe (disclaimer, I did not test) that these are two valid workarounds. The first one might not be a fit for EasyBuild though. If you are using module files, export OMPI_MCA_osc_rdma_priority=50 is also an option.

ggouaillardet avatar Jul 08 '21 08:07 ggouaillardet

If you are using module files, export OMPI_MCA_osc_rdma_priority=50 is also an option.

As EasyBuild creates module files this might indeed be an option. To make sure this works across all existing (and future) OMPI versions: Is there any reference about the default priorities and their changes?

Flamefire avatar Jul 08 '21 09:07 Flamefire

The defaults have not changed in some time. osc/rdma is supposed to be higher priority than osc/ucx unless pml/ucx was selected. I believe that logic should be in the 4.x series.

Thinking a bit more about this if someone does select osc/rdma when using ucx they should be using btl/uct not libfabric. I had always intended to enable that btl by default if that happens. I will try to find time to work on that logic. For ucx the ucx osc should default though.

hjelmn avatar Jul 08 '21 13:07 hjelmn

Okay - rebuilding with the tip of the 4.1.x series I'm not seeing UCX "force its way to the front" any more.

mwheinz avatar Jul 08 '21 14:07 mwheinz

I'm wondering if this didn't wind up wandering off-topic for a bit, so let me make a suggestion. The real issue here is one that has been plaguing us for a bit now - how to ensure the right selection for ucx vs libfabric vs BTL components out-of-the-box when a packager builds all of them? Perhaps the solution is to generalize this and be a little more direct about it?

Why not add a function to MPI_Init that:

  • gets the fabric vendor(s). PMIx knows what they are (part of our fabric support), so we can provide them for lookup if you like - or since we now load hwloc very early, you could scan the topology to extract the vendor names.
  • if there are more than one vendor, then check the directives to see if the user specified which one to use. Might be a new MCA param for "fabric_type". If not specified, then error out with a show_help msg
  • if the vendor is "Mellanox", then set a global param to indicate that all ucx components should be used - e.g., ompi_fabric_type=ucx
  • if the vendor is a libfabric member, then set the global param to indicate that only ofi components should be used - e.g., ompi_fabric_type=ofi
  • if the vendor fits neither category (e.g., vanilla TCP), then leave the param NULL
  • update the various pml, coll, and other components to check the param and disqualify themselves if the corresponding fabric type doesn't match. So pml/ucx would check NULL != ompi_fabric_type && 0 == strcmp(ompi_fabric_type, "ucx") as would coll/hcoll. The mtl/ofi would check NULL != ompi_fabric_type && 0 == strcmp(ompi_fabric_type, "ofi"). Etc.

Maybe you could get more detailed or come up with a better scheme, but something like this is likely required to finally resolve this mess.

rhc54 avatar Jul 11 '21 15:07 rhc54

Ralph. I was thinking along these lines myself. Something along the lines of a tuning framework. The framework components could set their priority off the detected hardware. The highest priority component would then be allowed to change the defaults parameters only. Users can use standard component selection to override which components run or win.

hjelmn avatar Jul 12 '21 02:07 hjelmn

I like the idea of asking PMIx what the fabric vendor(s) are.

Could we simply use this information and set OMPI_MCA_ environment variables in order to disqualify or lower the priority of some components? Or is this PMIx too late in the initialization process to use this quite trivial method?

@hjelmn is this how you intended to implement the tuning framework/components?

ggouaillardet avatar Jul 12 '21 04:07 ggouaillardet

I wouldn't set MCA params personally as that would somewhat violate the OMPI approach - i.e., we traditionally just have each component check to see if it can run and go from there. In this case, I'd just retrieve the fabric vendors from PMIx into some global variable and let each component decide what that means to it. I think that would ultimately be easier to maintain.

I'm not sure how this "tuning" framework would work - we don't allow cross-component or cross-framework dependencies, so I don't see how a component in a "tuning" framework could tell someone like coll/hcoll or pml/ucx "you can't run".

rhc54 avatar Jul 12 '21 20:07 rhc54

I read what Nathan was suggesting was something akin to how the PML layers on top of the BTLs. Not a cross-component call, but one through the framework interface. The framework would figure out which device should be used and allow other componetns to use that information in their selection priority. I had proposed creating a net framework last year, which was a riff on an idea Josh Hursey had. We've always thought this was something worth doing, but with the Libfabric/UCX behaviors seems like it's finally getting to critical.

In Ralph's proposal to use PMIx, how are you determining vendor? Have some concerns there around EFA, but they might be completely unfounded...

bwbarrett avatar Jul 12 '21 21:07 bwbarrett

In Ralph's proposal to use PMIx, how are you determining vendor? Have some concerns there around EFA, but they might be completely unfounded...

Harvesting it from hwloc - we do so in order to provide the fabric coordinate info. If that isn't adequate for EFA, then I'll need to update it in PMIx anyway, so it would be good to learn how to detect that one.

rhc54 avatar Jul 12 '21 21:07 rhc54

The OMPI layer having access to the same information I wonder why delegating the vendor discovery to another software component (PMIx in this instance) is a good approach ?

bosilca avatar Jul 13 '21 12:07 bosilca

Instead of adding another decision layer we can provide more flexibility/responsibility to the developers of the network support to select the set of components that make sense on their specific hardware. To give an example: most network stacks (OFI, UCX, sm, ...) having a common set of support functions (in opal/mca/common/), there is a unified location that can decide (on the first call to the common framework) if the hardware corresponding to a specific vendor/network stack is available and selected, and alter the priority and exclusivity of all corresponding components (BTL/PML/OSC/...).

bosilca avatar Jul 13 '21 12:07 bosilca

The OMPI layer having access to the same information I wonder why delegating the vendor discovery to another software component (PMIx in this instance) is a good approach ?

Not saying you need to do so - just offered to provide it because PMIx already has to do it. PMIx provides the fabric coordinates for each process along with endpoint assignments (where possible), switch colocation info, etc. As part of that process, we identify and provide the fabric vendor and other identification info.

So it is already available, if you want to access it. Or you can re-parse the hwloc topo tree yourselves to recreate it if you prefer.

rhc54 avatar Jul 13 '21 12:07 rhc54