ompi
ompi copied to clipboard
How do I use OpenMpi in AWS EFA
I use a normal node and it works fine, but I use a node that supports EFA network and report the following error:
[1641461840.238519] [c-96-4-worker0001:3595 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238244] [c-96-4-worker0001:3598 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [c-96-4-worker0001:03640] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03646] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03671] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03603] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03604] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03610] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03615] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03616] pml_ucx.c:291 Error: Failed to create UCP worker [1641461840.237779] [c-96-4-worker0001:3605 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.239128] [c-96-4-worker0001:3606 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.239325] [c-96-4-worker0001:3610 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.237871] [c-96-4-worker0001:3619 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238154] [c-96-4-worker0001:3616 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.237974] [c-96-4-worker0001:3637 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.237984] [c-96-4-worker0001:3639 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.239203] [c-96-4-worker0001:3647 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238474] [c-96-4-worker0001:3664 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238478] [c-96-4-worker0001:3660 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238485] [c-96-4-worker0001:3661 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.240738] [c-96-4-worker0001:3676 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.242309] [c-96-4-worker0001:3680 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.238529] [c-96-4-worker0001:3674 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.241533] [c-96-4-worker0001:3678 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [1641461840.241750] [c-96-4-worker0001:3704 :0] rc_iface.c:492 UCX ERROR ibv_create_srq() failed: Operation not supported [c-96-4-worker0001:03593] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03595] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03597] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03600] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03612] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03613] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001:03619] pml_ucx.c:291 Error: Failed to create UCP worker [c-96-4-worker0001][[52336,1],41][btl_tcp_endpoint.c:625:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[52336,1],29] [c-96-4-worker0001][[52336,1],31][btl_tcp_endpoint.c:625:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[52336,1],28] [c-96-4-worker0001][[52336,1],32][btl_tcp_endpoint.c:625:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[52336,1],0]
Can you please post the commands you used to run your application?
I added the following parameters via this link(https://github.com/open-mpi/ompi/issues/6795 ):
--mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 10
,
The following error has occurred:
[c-96-4-worker0002:04711] mca: base: components_register: component cm register function successful
[c-96-4-worker0002:04711] mca: base: components_open: opening pml components
[c-96-4-worker0002:04711] mca: base: components_open: found loaded component cm
[c-96-4-worker0002:04710] mca: base: components_register: registering framework pml components
[c-96-4-worker0002:04710] mca: base: components_register: found loaded component cm
[c-96-4-worker0002:04710] mca: base: components_register: component cm register function successful
[c-96-4-worker0002:04710] mca: base: components_open: opening pml components
[c-96-4-worker0002:04710] mca: base: components_open: found loaded component cm
[c-96-4-worker0001:04704] mca: base: components_register: registering framework pml components
[c-96-4-worker0001:04704] mca: base: components_register: found loaded component cm
[c-96-4-worker0001:04704] mca: base: components_register: component cm register function successful
[c-96-4-worker0001:04704] mca: base: components_open: opening pml components
[c-96-4-worker0001:04704] mca: base: components_open: found loaded component cm
[c-96-4-worker0002:04708] mca: base: components_register: registering framework pml components
[c-96-4-worker0002:04716] mtl_ofi_component.c:315: mtl:ofi:provider_include = "(null)"
[c-96-4-worker0002:04716] mtl_ofi_component.c:318: mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream"
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
[c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
Can you please post the commands you used to run your application?
#!/bin/bash module add GROMACS/2021-foss-2020b mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu
I added the following parameters via this link(#6795 ):
--mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 10
, The following error has occurred: [c-96-4-worker0002:04711] mca: base: components_register: component cm register function successful [c-96-4-worker0002:04711] mca: base: components_open: opening pml components [c-96-4-worker0002:04711] mca: base: components_open: found loaded component cm [c-96-4-worker0002:04710] mca: base: components_register: registering framework pml components [c-96-4-worker0002:04710] mca: base: components_register: found loaded component cm [c-96-4-worker0002:04710] mca: base: components_register: component cm register function successful [c-96-4-worker0002:04710] mca: base: components_open: opening pml components [c-96-4-worker0002:04710] mca: base: components_open: found loaded component cm [c-96-4-worker0001:04704] mca: base: components_register: registering framework pml components [c-96-4-worker0001:04704] mca: base: components_register: found loaded component cm [c-96-4-worker0001:04704] mca: base: components_register: component cm register function successful [c-96-4-worker0001:04704] mca: base: components_open: opening pml components [c-96-4-worker0001:04704] mca: base: components_open: found loaded component cm [c-96-4-worker0002:04708] mca: base: components_register: registering framework pml components [c-96-4-worker0002:04716] mtl_ofi_component.c:315: mtl:ofi:provider_include = "(null)" [c-96-4-worker0002:04716] mtl_ofi_component.c:318: mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream" [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list [c-96-4-worker0002:04716] mtl_ofi_component.c:336: mtl:ofi: "tcp;ofi_rxm" in exclude list
#!/bin/bash module add GROMACS/2021-foss-2020b mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10
what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?
Hi @chenshixinnb,
I have a few questions:
- What version of open mpi are you using?
- What version of libfabric are you using (you can get the info by running
fi_info
)? - Are you trying to use GPU version of GROMACS?
Hi @chenshixinnb,
I have a few questions:
- What version of open mpi are you using?
- What version of libfabric are you using (you can get the info by running
fi_info
)?- Are you trying to use GPU version of GROMACS?
thanks,
1.openmpi4.0.5
2.[cloudam@c-96-4-worker0001 ~]$ fi_info -p efa
provider: efa
fabric: EFA-fe80::4fc:a6ff:fe55:bb98
domain: rdmap0s6-rdm
version: 111.20
type: FI_EP_RDM
protocol: FI_PROTO_EFA
provider: efa
fabric: EFA-fe80::4fc:a6ff:fe55:bb98
domain: rdmap0s6-dgrm
version: 111.20
type: FI_EP_DGRAM
protocol: FI_PROTO_EFA
3.no,I use MPI parallelism, -cpi nvt_gpu -deffnm nvt_gpu
,"nvt_gpu" is input file name
what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?
what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?
sorry,I pasted it wrong, --mca pml cm --mca mtl ofi --mca pml_base_verbose 10
Those errors look like ucx errors, which wouldn't appear if the ofi mtl was properly selected. EFA is only supported with the ofi mtl. IIRC, specifying --mca pml cm doesn't fail if the cm pml cannot find a proper mtl. Try excluding the ucx pml --mca pml ^ucx. My suspicion is that the ofi mtl is not properly selecting the EFA provider and is causing a fallback to ucx.
EDIT: My recollection was wrong, specifying --mca pml cm should force the cm pml to be selected or an error will be thrown if it cannot select an MTL. I misread and realized you added --mca pml cm later.
Can you add --mca mtl_base_verbose 100
to your mpirun
command line and share the output?
slurm2-out.txt
#!/bin/bash module add GROMACS/2021-foss-2020b mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100
Can you add
--mca mtl_base_verbose 100
to yourmpirun
command line and share the output?
Looks like the initialization of ofi (libfabric) failed.
[c-96-4-worker0001:03645] select: init returned failure for component ofi
[c-96-4-worker0001:03645] select: no component selected
[c-96-4-worker0001:03645] select: init returned failure for component cm
Please add -x FI_LOG_LEVEL=warn
to your mpirun
command. This will make libfabric print more information.
slurm3-out.txt
mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100 -x FI_LOG_LEVEL=warn
Looks like the initialization of ofi (libfabric) failed.
[c-96-4-worker0001:03645] select: init returned failure for component ofi [c-96-4-worker0001:03645] select: no component selected [c-96-4-worker0001:03645] select: init returned failure for component cm
Please add
-x FI_LOG_LEVEL=warn
to yourmpirun
command. This will make libfabric print more information.
I did not see any information from libfabric printed, which makes me wonder whether openmpi was compiled correctly with libfabric.
How did you obtain open mpi?
Did you compile by yourself or got it from other source?
[c-96-4-worker0002:03606] mtl_ofi_component.c:541: select_ofi_provider: no provider found
[c-96-4-worker0002:03606] select: init returned failure for component ofi
[c-96-4-worker0002:03606] select: no component selected
This is the important section. The fi_getinfo call in Open MPI did not return a provider (efa) and the other available providers are in the exclude list, thus no MTL was returned and the PML CM could not progress.
I feel like there's something missing here if fi_info returns an EFA provider but fi_getinfo isn't. Have you verified that all nodes in your slurm cluster return a provider when fi_info -p efa is called?
I briefly took a look at the hints that were being provided, but unless MTL_OFI_PROG_AUTO is set, I don't think there's an issue with the hints. If you want to dig deeper, the section of the code is the function ompi_mtl_ofi_component_init
where the call to fi_getinfo
returns no efa provider. (select_ofi_provider only checks and filters based off the include/exclude list, the problem is that the fi_getinfo call doesn't return the efa provider.
I did not see any information from libfabric printed, which makes me wonder whether openmpi was compiled correctly with libfabric.
How did you obtain open mpi?
Did you compile by yourself or got it from other source?
With just FI_LOG_LEVEL=warn, I don't think seeing a lack of logs is indicative. I'm pretty sure libfabric is responding as it looks like the fi_getinfo call is returning tcp;ofi_rxm, UDP;ofi_rxd, and shm providers. The problem is that the EFA provider isn't in the fi_getinfo call.
thanks,I used EasyBuild OpenMpi tool chain default automatic compilation
it is possible that open mpi was not configured or compiled with libfabric correctly.
Because you have libfaric, I assume you used EFA installer to install it. Can you try to use the open mpi comes with EFA installer? It is under /opt/amazon/openmpi
.
基于业务原因,我需要将openmpi安装到共享盘,并且需要通过共享盘的openmpi编译安装软件
可能未使用 libfabric 正确配置或编译 open mpi。
因为你有 libfaric,我假设你使用 EFA 安装程序来安装它。你可以尝试使用 EFA 安装程序自带的 open mpi 吗?它在
/opt/amazon/openmpi
.
I see. Can you run the command ompi_info
that is par of the openmpi you are using, and paste the result?
I see. Can you run the command
ompi_info
that is par of the openmpi you are using, and paste the result?
thanks
Hi, I noticed that the open mpi you are using is not configured with libfabric.
Configure command line: '--prefix=/public/software/.local/easybuild/software/OpenMPI/4.0.5-GCC-10.2.0' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-cuda=no' '--with-hwloc=/public/software/.local/easybuild/software/hwloc/2.2.0-GCCcore-10.2.0' '--with-libevent=/public/software/.local/easybuild/software/libevent/2.1.12-GCCcore-10.2.0' '--with-ofi=/public/software/.local/easybuild/software/libfabric/1.11.0-GCCcore-10.2.0' '--with-pmix=/public/software/.local/easybuild/software/PMIx/3.1.5-GCCcore-10.2.0' '--with-ucx=/public/software/.local/easybuild/software/UCX/1.9.0-GCCcore-10.2.0' '--without-verbs'
You will need configure open mpi with libfabric, e.g. when you compile open mpi, you will need to add --with-ofi=/opt/amazon/libfabric
to the configure command, also remove --with-ucx
.
curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz
thanks,Can I install EFA to a shared disk by changing the script path?
Hi, I noticed that the open mpi you are using is not configured with libfabric.
Configure command line: '--prefix=/public/software/.local/easybuild/software/OpenMPI/4.0.5-GCC-10.2.0' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-cuda=no' '--with-hwloc=/public/software/.local/easybuild/software/hwloc/2.2.0-GCCcore-10.2.0' '--with-libevent=/public/software/.local/easybuild/software/libevent/2.1.12-GCCcore-10.2.0' '--with-ofi=/public/software/.local/easybuild/software/libfabric/1.11.0-GCCcore-10.2.0' '--with-pmix=/public/software/.local/easybuild/software/PMIx/3.1.5-GCCcore-10.2.0' '--with-ucx=/public/software/.local/easybuild/software/UCX/1.9.0-GCCcore-10.2.0' '--without-verbs'
You will need configure open mpi with libfabric, e.g. when you compile open mpi, you will need to add
--with-ofi=/opt/amazon/libfabric
to the configure command, also remove--with-ucx
.
thanks,Can I install EFA to a shared disk by changing the script path?
No, it always install open mpi to /opt/amazon
.
Note that to use EFA you will need to run EFA installer on each compute node any way, because using EFA requires you to install rdma-core and EFA kernel module, which is shipped as part of EFA installer and usually cannot be installed to a shared disk.
slurm5-out.txt
#!/bin/bash module add GROMACS/2021-foss-2020b export PATH=/home/cloudam/OpenMpi-4.0.5/bin:$PATH export LD_LIBRARY_PATH=/home/cloudam/OpenMpi-4.0.5/lib:$LD_LIBRARY_PATH mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100 -x FI_LOG_LEVEL=warn
slurm5-out.txt
#!/bin/bash module add GROMACS/2021-foss-2020b export PATH=/home/cloudam/OpenMpi-4.0.5/bin:$PATH export LD_LIBRARY_PATH=/home/cloudam/OpenMpi-4.0.5/lib:$LD_LIBRARY_PATH mpirun -v gmx_mpi mdrun -v -cpi nvt_gpu -deffnm nvt_gpu --mca pml cm --mca mtl ofi --mca pml_base_verbose 10 --mca mtl_base_verbose 100 -x FI_LOG_LEVEL=warn
I recompiled OpenMPi-4.0.5,but there are still problems.
Configure command line: '--prefix=/home/cloudam/OpenMpi-4.0.5' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-cuda=no' '--with-hwloc=/public/software/.local/easybuild/software/hwloc/2.2.0-GCCcore-10.2.0' '--with-libevent=/public/software/.local/easybuild/software/libevent/2.1.12-GCCcore-10.2.0' '--with-pmix=/public/software/.local/easybuild/software/PMIx/3.1.5-GCCcore-10.2.0' '--with-ofi=/opt/amazon/efa' '--without-verbs'
Hi, does the compute node (such as c-96-4-worker0002) has EFA installer installed on it?