ompi icon indicating copy to clipboard operation
ompi copied to clipboard

How do I use OpenMpi in AWS EFA

Open chenshixinnb opened this issue 3 years ago • 38 comments

I use a normal node and it works fine, but I use a node that supports EFA network and report the following error:

chenshixinnb avatar Jan 11 '22 10:01 chenshixinnb

[1641461840.238519] [c-96-4-worker0001:3595 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238244] [c-96-4-worker0001:3598 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [c-96-4-worker0001:03640] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03646] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03671] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03603] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03604] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03610] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03615] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03616] pml_ucx.c:291 Error: Failed to create UCP worker [1641461840.237779] [c-96-4-worker0001:3605 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.239128] [c-96-4-worker0001:3606 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.239325] [c-96-4-worker0001:3610 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.237871] [c-96-4-worker0001:3619 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238154] [c-96-4-worker0001:3616 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.237974] [c-96-4-worker0001:3637 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.237984] [c-96-4-worker0001:3639 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.239203] [c-96-4-worker0001:3647 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238474] [c-96-4-worker0001:3664 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238478] [c-96-4-worker0001:3660 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238485] [c-96-4-worker0001:3661 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.240738] [c-96-4-worker0001:3676 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.242309] [c-96-4-worker0001:3680 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238529] [c-96-4-worker0001:3674 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.241533] [c-96-4-worker0001:3678 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.241750] [c-96-4-worker0001:3704 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [c-96-4-worker0001:03593] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03595] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03597] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03600] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03612] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03613] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03619] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001][[52336,1],41][btl_tcp_endpoint.c:625:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[52336,1],29] [c-96-4-worker0001][[52336,1],31][btl_tcp_endpoint.c:625:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[52336,1],28] [c-96-4-worker0001][[52336,1],32][btl_tcp_endpoint.c:625:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[52336,1],0]

chenshixinnb avatar Jan 11 '22 10:01 chenshixinnb

Can you please post the commands you used to run your application?

brminich avatar Jan 11 '22 10:01 brminich

I added the following parameters via this link(https://github.com/open-mpi/ompi/issues/6795 ): --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 10, The following error has occurred: [c-96-4-worker0002:04711] mca: base: components_register: component cm register function successful [c-96-4-worker0002:04711] mca: base: components_open: opening pml components [c-96-4-worker0002:04711] mca: base: components_open: found loaded component cm [c-96-4-worker0002:04710] mca: base: components_register: registering framework pml components [c-96-4-worker0002:04710] mca: base: components_register: found loaded component cm [c-96-4-worker0002:04710] mca: base: components_register: component cm register function successful [c-96-4-worker0002:04710] mca: base: components_open: opening pml components [c-96-4-worker0002:04710] mca: base: components_open: found loaded component cm [c-96-4-worker0001:04704] mca: base: components_register: registering framework pml components [c-96-4-worker0001:04704] mca: base: components_register: found loaded component cm [c-96-4-worker0001:04704] mca: base: components_register: component cm register function successful [c-96-4-worker0001:04704] mca: base: components_open: opening pml components [c-96-4-worker0001:04704] mca: base: components_open: found loaded component cm [c-96-4-worker0002:04708] mca: base: components_register: registering framework pml components [c-96-4-worker0002:04716] mtl_ofi_component.c:315: mtl:ofi:provider_include = "(null)" [c-96-4-worker0002:04716] mtl_ofi_component.c:318: mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream" [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list

chenshixinnb avatar Jan 11 '22 10:01 chenshixinnb

Can you please post the commands you used to run your application?

#!/bin/bash module add GROMACS/2021-foss-2020b mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu

chenshixinnb avatar Jan 11 '22 10:01 chenshixinnb

I added the following parameters via this link(#6795 ): --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 10, The following error has occurred: [c-96-4-worker0002:04711] mca: base: components_register: component cm register function successful [c-96-4-worker0002:04711] mca: base: components_open: opening pml components [c-96-4-worker0002:04711] mca: base: components_open: found loaded component cm [c-96-4-worker0002:04710] mca: base: components_register: registering framework pml components [c-96-4-worker0002:04710] mca: base: components_register: found loaded component cm [c-96-4-worker0002:04710] mca: base: components_register: component cm register function successful [c-96-4-worker0002:04710] mca: base: components_open: opening pml components [c-96-4-worker0002:04710] mca: base: components_open: found loaded component cm [c-96-4-worker0001:04704] mca: base: components_register: registering framework pml components [c-96-4-worker0001:04704] mca: base: components_register: found loaded component cm [c-96-4-worker0001:04704] mca: base: components_register: component cm register function successful [c-96-4-worker0001:04704] mca: base: components_open: opening pml components [c-96-4-worker0001:04704] mca: base: components_open: found loaded component cm [c-96-4-worker0002:04708] mca: base: components_register: registering framework pml components [c-96-4-worker0002:04716] mtl_ofi_component.c:315: mtl:ofi:provider_include = "(null)" [c-96-4-worker0002:04716] mtl_ofi_component.c:318: mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream" [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list

#!/bin/bash module add GROMACS/2021-foss-2020b mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10

chenshixinnb avatar Jan 11 '22 10:01 chenshixinnb

what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?

brminich avatar Jan 11 '22 10:01 brminich

Hi @chenshixinnb,

I have a few questions:

  1. What version of open mpi are you using?
  2. What version of libfabric are you using (you can get the info by running fi_info)?
  3. Are you trying to use GPU version of GROMACS?

wzamazon avatar Jan 11 '22 13:01 wzamazon

Hi @chenshixinnb,

I have a few questions:

  1. What version of open mpi are you using?
  2. What version of libfabric are you using (you can get the info by running fi_info)?
  3. Are you trying to use GPU version of GROMACS?

thanks, 1.openmpi4.0.5 2.[cloudam@c-96-4-worker0001 ~]$ fi_info -p efa provider: efa fabric: EFA-fe80::4fc:a6ff:fe55:bb98 domain: rdmap0s6-rdm version: 111.20 type: FI_EP_RDM protocol: FI_PROTO_EFA provider: efa fabric: EFA-fe80::4fc:a6ff:fe55:bb98 domain: rdmap0s6-dgrm version: 111.20 type: FI_EP_DGRAM protocol: FI_PROTO_EFA 3.no,I use MPI parallelism, -cpi nvt_gpu -deffnm nvt_gpu,"nvt_gpu" is input file name

chenshixinnb avatar Jan 11 '22 14:01 chenshixinnb

what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?

what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?

sorry,I pasted it wrong, --mca pml cm --mca mtl ofi --mca pml_base_verbose 10

chenshixinnb avatar Jan 11 '22 14:01 chenshixinnb

Those errors look like ucx errors, which wouldn't appear if the ofi mtl was properly selected. EFA is only supported with the ofi mtl. IIRC, specifying --mca pml cm doesn't fail if the cm pml cannot find a proper mtl. Try excluding the ucx pml --mca pml ^ucx. My suspicion is that the ofi mtl is not properly selecting the EFA provider and is causing a fallback to ucx.

wckzhang avatar Jan 11 '22 20:01 wckzhang

EDIT: My recollection was wrong, specifying --mca pml cm should force the cm pml to be selected or an error will be thrown if it cannot select an MTL. I misread and realized you added --mca pml cm later.

wckzhang avatar Jan 11 '22 20:01 wckzhang

Can you add --mca mtl_base_verbose 100 to your mpirun command line and share the output?

wzamazon avatar Jan 11 '22 21:01 wzamazon

slurm2-out.txt

#!/bin/bash module add GROMACS/2021-foss-2020b mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100

Can you add --mca mtl_base_verbose 100 to your mpirun command line and share the output?

chenshixinnb avatar Jan 12 '22 01:01 chenshixinnb

Looks like the initialization of ofi (libfabric) failed.

[c-96-4-worker0001:03645] select: init returned failure for component ofi
[c-96-4-worker0001:03645] select: no component selected
[c-96-4-worker0001:03645] select: init returned failure for component cm

Please add -x FI_LOG_LEVEL=warn to your mpirun command. This will make libfabric print more information.

wzamazon avatar Jan 12 '22 01:01 wzamazon

slurm3-out.txt

mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100 -x FI_LOG_LEVEL=warn

Looks like the initialization of ofi (libfabric) failed.

[c-96-4-worker0001:03645] select: init returned failure for component ofi
[c-96-4-worker0001:03645] select: no component selected
[c-96-4-worker0001:03645] select: init returned failure for component cm

Please add -x FI_LOG_LEVEL=warn to your mpirun command. This will make libfabric print more information.

chenshixinnb avatar Jan 12 '22 02:01 chenshixinnb

I did not see any information from libfabric printed, which makes me wonder whether openmpi was compiled correctly with libfabric.

How did you obtain open mpi?

Did you compile by yourself or got it from other source?

wzamazon avatar Jan 12 '22 05:01 wzamazon

[c-96-4-worker0002:03606] mtl_ofi_component.c:541: select_ofi_provider: no provider found
[c-96-4-worker0002:03606] select: init returned failure for component ofi
[c-96-4-worker0002:03606] select: no component selected

This is the important section. The fi_getinfo call in Open MPI did not return a provider (efa) and the other available providers are in the exclude list, thus no MTL was returned and the PML CM could not progress.

I feel like there's something missing here if fi_info returns an EFA provider but fi_getinfo isn't. Have you verified that all nodes in your slurm cluster return a provider when fi_info -p efa is called?

I briefly took a look at the hints that were being provided, but unless MTL_OFI_PROG_AUTO is set, I don't think there's an issue with the hints. If you want to dig deeper, the section of the code is the function ompi_mtl_ofi_component_init where the call to fi_getinfo returns no efa provider. (select_ofi_provider only checks and filters based off the include/exclude list, the problem is that the fi_getinfo call doesn't return the efa provider.

wckzhang avatar Jan 12 '22 17:01 wckzhang

I did not see any information from libfabric printed, which makes me wonder whether openmpi was compiled correctly with libfabric.

How did you obtain open mpi?

Did you compile by yourself or got it from other source?

With just FI_LOG_LEVEL=warn, I don't think seeing a lack of logs is indicative. I'm pretty sure libfabric is responding as it looks like the fi_getinfo call is returning tcp;ofi_rxm, UDP;ofi_rxd, and shm providers. The problem is that the EFA provider isn't in the fi_getinfo call.

wckzhang avatar Jan 12 '22 19:01 wckzhang

thanks,I used EasyBuild OpenMpi tool chain default automatic compilation

chenshixinnb avatar Jan 13 '22 01:01 chenshixinnb

it is possible that open mpi was not configured or compiled with libfabric correctly.

Because you have libfaric, I assume you used EFA installer to install it. Can you try to use the open mpi comes with EFA installer? It is under /opt/amazon/openmpi.

wzamazon avatar Jan 13 '22 03:01 wzamazon

基于业务原因,我需要将openmpi安装到共享盘,并且需要通过共享盘的openmpi编译安装软件

可能未使用 libfabric 正确配置或编译 open mpi。

因为你有 libfaric,我假设你使用 EFA 安装程序来安装它。你可以尝试使用 EFA 安装程序自带的 open mpi 吗?它在/opt/amazon/openmpi.

chenshixinnb avatar Jan 13 '22 03:01 chenshixinnb

I see. Can you run the command ompi_info that is par of the openmpi you are using, and paste the result?

wzamazon avatar Jan 13 '22 03:01 wzamazon

ompi_info.txt

I see. Can you run the command ompi_info that is par of the openmpi you are using, and paste the result?

chenshixinnb avatar Jan 13 '22 03:01 chenshixinnb

thanks

chenshixinnb avatar Jan 13 '22 03:01 chenshixinnb

Hi, I noticed that the open mpi you are using is not configured with libfabric.

 Configure command line: '--prefix=/public/software/.local/easybuild/software/OpenMPI/4.0.5-GCC-10.2.0' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-cuda=no' '--with-hwloc=/public/software/.local/easybuild/software/hwloc/2.2.0-GCCcore-10.2.0' '--with-libevent=/public/software/.local/easybuild/software/libevent/2.1.12-GCCcore-10.2.0' '--with-ofi=/public/software/.local/easybuild/software/libfabric/1.11.0-GCCcore-10.2.0' '--with-pmix=/public/software/.local/easybuild/software/PMIx/3.1.5-GCCcore-10.2.0' '--with-ucx=/public/software/.local/easybuild/software/UCX/1.9.0-GCCcore-10.2.0' '--without-verbs'

You will need configure open mpi with libfabric, e.g. when you compile open mpi, you will need to add --with-ofi=/opt/amazon/libfabric to the configure command, also remove --with-ucx.

wzamazon avatar Jan 13 '22 04:01 wzamazon

curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz thanks,Can I install EFA to a shared disk by changing the script path?

Hi, I noticed that the open mpi you are using is not configured with libfabric.

 Configure command line: '--prefix=/public/software/.local/easybuild/software/OpenMPI/4.0.5-GCC-10.2.0' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-cuda=no' '--with-hwloc=/public/software/.local/easybuild/software/hwloc/2.2.0-GCCcore-10.2.0' '--with-libevent=/public/software/.local/easybuild/software/libevent/2.1.12-GCCcore-10.2.0' '--with-ofi=/public/software/.local/easybuild/software/libfabric/1.11.0-GCCcore-10.2.0' '--with-pmix=/public/software/.local/easybuild/software/PMIx/3.1.5-GCCcore-10.2.0' '--with-ucx=/public/software/.local/easybuild/software/UCX/1.9.0-GCCcore-10.2.0' '--without-verbs'

You will need configure open mpi with libfabric, e.g. when you compile open mpi, you will need to add --with-ofi=/opt/amazon/libfabric to the configure command, also remove --with-ucx.

chenshixinnb avatar Jan 13 '22 05:01 chenshixinnb

thanks,Can I install EFA to a shared disk by changing the script path?

No, it always install open mpi to /opt/amazon.

Note that to use EFA you will need to run EFA installer on each compute node any way, because using EFA requires you to install rdma-core and EFA kernel module, which is shipped as part of EFA installer and usually cannot be installed to a shared disk.

wzamazon avatar Jan 13 '22 05:01 wzamazon

slurm5-out.txt

#!/bin/bash module add GROMACS/2021-foss-2020b export PATH=/home/cloudam/OpenMpi-4.0.5/bin:$PATH export LD_LIBRARY_PATH=/home/cloudam/OpenMpi-4.0.5/lib:$LD_LIBRARY_PATH mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100 -x FI_LOG_LEVEL=warn

chenshixinnb avatar Jan 21 '22 11:01 chenshixinnb

slurm5-out.txt

#!/bin/bash module add GROMACS/2021-foss-2020b export PATH=/home/cloudam/OpenMpi-4.0.5/bin:$PATH export LD_LIBRARY_PATH=/home/cloudam/OpenMpi-4.0.5/lib:$LD_LIBRARY_PATH mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100 -x FI_LOG_LEVEL=warn

I recompiled OpenMPi-4.0.5,but there are still problems.

Configure command line: '--prefix=/home/cloudam/OpenMpi-4.0.5' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-cuda=no' '--with-hwloc=/public/software/.local/easybuild/software/hwloc/2.2.0-GCCcore-10.2.0' '--with-libevent=/public/software/.local/easybuild/software/libevent/2.1.12-GCCcore-10.2.0' '--with-pmix=/public/software/.local/easybuild/software/PMIx/3.1.5-GCCcore-10.2.0' '--with-ofi=/opt/amazon/efa' '--without-verbs'

chenshixinnb avatar Jan 21 '22 11:01 chenshixinnb

Hi, does the compute node (such as c-96-4-worker0002) has EFA installer installed on it?

wzamazon avatar Jan 21 '22 13:01 wzamazon