ompi
ompi copied to clipboard
Problems when running examples hello_c
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from a source/distribution tarball
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
- Operating system/version: CentOS Linux release 7.6.1810 (AltArch) Linux version 4.14.0-115.el7a.0.1.aarch64 ([email protected])
- Computer hardware:
[nscc-gz@centos203 examples]$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 4
Model: 0
BogoMIPS: 200.00
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 65536K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
NUMA node2 CPU(s): 64-95
NUMA node3 CPU(s): 96-127
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop
- Network type:
Details of the problem
Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.
Hi,When I running the hello_c,I get the following output
[nscc-gz@centos203 examples]$ mpirun -np 4 --mca orte_base_help_aggregate 0
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
and the ibstat output
[nscc-gz@centos203 examples]$ ibstat
CA 'mlx5_0'
CA type: MT4117
Number of ports: 1
Firmware version: 14.20.1820
Hardware version: 0
Node GUID:
System image GUID:
Port 1:
State: Active
Physical state: LinkUp
Rate: 25
Base lid: 0
LMC: 0
SM lid: 0
Capability mask:
Port GUID:
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4117
Number of ports: 1
Firmware version: 14.20.1820
Hardware version: 0
Node GUID:
System image GUID:
Port 1:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask:
Port GUID:
Link layer: Ethernet
if I use this command
[nscc-gz@centos203 examples]$ mpirun --mca btl openib,self,vader --mca btl_openib_cpc_include rdmacm -np 4 hello_c
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device:
Local port: 1
CPCs attempted: rdmacm
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:10977] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[centos203:10977] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
And if i designate the ib device
[nscc-gz@centos203 examples]$ mpirun -np 4 ./hello_c --mca btl_openib_if_exclude mlx5_0
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: centos203
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:14896] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
[centos203:14896] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[nscc-gz@centos203 examples]$
The ifconfig and ib port
[nscc-gz@centos203 examples]$ ibdev2netdev
mlx5_0 port 1 ==> enp1s0f0 (Up)
mlx5_1 port 1 ==> enp1s0f1 (Down)
[nscc-gz@centos203 examples]$ ifconfig
enp125s0f0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether txqueuelen 1000 (Ethernet)
RX packets 1951751285 bytes 2352472322729 (2.1 TiB)
RX errors 0 dropped 11718888 overruns 0 frame 0
TX packets 822856179 bytes 1385364963277 (1.2 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp125s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.29.130 netmask 255.255.255.0 broadcast 172.16.29.255
inet6 prefixlen 64 scopeid 0x20<link>
ether txqueuelen 1000 (Ethernet)
RX packets 19347918 bytes 7289410117 (6.7 GiB)
RX errors 0 dropped 2958451 overruns 0 frame 0
TX packets 12963627 bytes 48203399135 (44.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp125s0f2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp125s0f3: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp1s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.40.1.203 netmask 255.255.255.0 broadcast 10.40.1.255
inet6 prefixlen 64 scopeid 0x20<link>
ether txqueuelen 1000 (Ethernet)
RX packets 382158355530 bytes 544487865896139 (495.2 TiB)
RX errors 208 dropped 3083040 overruns 0 frame 208
TX packets 379357423669 bytes 545429402206655 (496.0 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp1s0f1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 32048729965 bytes 809795856471103 (736.5 TiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 32048729965 bytes 809795856471103 (736.5 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[nscc-gz@centos203 examples]$
could you tell me how can I use the IB devices correctly? Thanks!
FYI @open-mpi/ucx team
@shiwch is running ucx pml an option? If so, what if you run with -mca pml ucx ?
@janjust I get the same warning if i run with -mca pml ucx
[nscc-gz@centos203 examples]$ mpirun -np 4 -mca pml ucx -mca btl_openib_if_exclude mlx5_0 ./hello_c
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: centos203
--------------------------------------------------------------------------
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:111249] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
[centos203:111249] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[nscc-gz@centos203 examples]$
@shiwch ok one more try -mca btl ^openib
@janjust Thanks! this command could run correctly. But I have another question, is that important? I mean whether there is an unknown performance penalty whitout supporting openib. It seems to work for communication?
[nscc-gz@centos203 examples]$ mpirun -np 4 -mca btl ^openib -mca pml ucx -mca btl_openib_if_exclude mlx5_0 ./hello_c
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[nscc-gz@centos203 examples]$
@shiwch How was Open MPI installed? What was the configure command?
I'm guessing in your case, because openib cannot be selected you're falling back to ucx.
@janjust I installed openmpi with these commands.
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.gz
tar -xvf openmpi-4.1.2.tar.gz
cd openmpi-4.1.2 && mkdir build && cd build
../configure --prefix=/home/nscc-gz/shi/mpi/openmpi-4.1.2
make -j 16 && make install
@shiwch Which mofed version? $ofed_info -s
@janjust
[nscc-gz@centos203 examples]$ ofed_info -s
MLNX_OFED_LINUX-4.7-3.2.9.0:
[nscc-gz@centos203 examples]$
@shiwch one more command $show_gids
@janjust
[nscc-gz@centos203 examples]$ show_gids
DEV PORT INDEX GID IPv4 VER DEV
--- ---- ----- --- ------------ --- ---
mlx5_0 1 0 fe80:0000:0000:0000:526b:4bff:fe43:a96e v1 enp1s0f0
mlx5_0 1 1 fe80:0000:0000:0000:526b:4bff:fe43:a96e v2 enp1s0f0
mlx5_0 1 2 fe80:0000:0000:0000:882e:96dd:e5e7:0477 v1 enp1s0f0
mlx5_0 1 3 fe80:0000:0000:0000:882e:96dd:e5e7:0477 v2 enp1s0f0
mlx5_0 1 4 0000:0000:0000:0000:0000:ffff:0a28:01cb 10.40.1.203 v1 enp1s0f0
mlx5_0 1 5 0000:0000:0000:0000:0000:ffff:0a28:01cb 10.40.1.203 v2 enp1s0f0
mlx5_1 1 0 fe80:0000:0000:0000:526b:4bff:fe43:a96f v1 enp1s0f1
mlx5_1 1 1 fe80:0000:0000:0000:526b:4bff:fe43:a96f v2 enp1s0f1
n_gids_found=8
[nscc-gz@centos203 examples]$
@shiwch A shot in the dark but please try mpirun -np 4 -mca btl_openib_warn_default_gid_prefix 4 -mca btl_openib_if_include mlx5_0:1 ./hello_c if not 4, try gid 5. For some reason openib is not geting selected because it cannot find the correct device/port to use. Looks like a configuration issue to me.
But irrespective of that, I would try to run also with verbose, because I'm guessing ucx is selected by default, so it wouldn't matter if you disbaled openib to get rid of the warning message.
@janjust Okay, thanks for your help! By the way, I tried the grid 4 and 5, but got the same warning as before.
[nscc-gz@centos203 examples]$ mpirun -np 4 -mca btl_openib_warn_default_gid_prefix 4 -mca btl_openib_if_include mlx5_0:1 ./hello_c
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:02086] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[centos203:02086] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[nscc-gz@centos203 examples]$ mpirun -np 4 -mca btl_openib_warn_default_gid_prefix 5 -mca btl_openib_if_include mlx5_0:1 ./hello_c
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: centos203
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:02141] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[centos203:02141] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[nscc-gz@centos203 examples]$
How do you solve this problem? I'm having the same problem.