ucx
ucx copied to clipboard
no matches to transport list in OpenMPI 4.1.4
Describe the bug
I can successfully test rc and ud transports on mlx4_0:1 using ucx_perftest. However, OpenMPI reports:
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 self/self: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 rc/mlx4_0:1: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 ud/mlx4_0:1: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 mm/sysv: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 mm/posix: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 cma/cma: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:337 support level is none
Steps to Reproduce
- Command line
mpirun --mca pml ucx --mca btl ^uct -x UCX_NET_DEVICES=mlx4_0:1 -x UCX_TLS=all --mca pml_base_verbose 100 --mca btl_base_verbose 100 -mca pml_ucx_verbose 100 $CODE_NAME
- UCX version used
# UCT version=1.4.0 revision 0000000
# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-static --without-avx --docdir=/usr/share/doc/packages/openucx
Setup and versions
- OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
- SLES12-SP5; x86_64
- For RDMA/IB/RoCE related issues:
- Driver version:
rdma-core-22.5-4.3.1.x86_64
;libibverbs1-22.5-4.3.1.x86_64
- HW information from
ibv_devinfo -vv
command
- Driver version:
transport: InfiniBand (0)
fw_ver: 2.9.1530
node_guid: 0002:c903:000f:2ae2
sys_image_guid: 0002:c903:000f:2ae5
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: HP_0160000009
phys_port_cnt: 2
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffe00
max_qp: 131000
max_qp_wr: 16351
device_cap_flags: 0x047c9c76
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
UD_AV_PORT_ENFORCE
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
UD_IP_CSUM
XRC
MEM_MGT_EXTENSIONS
RAW_IP_CSUM
Unknown flags: 0x488000
max_sge: 32
max_sge_rd: 30
max_cq: 65408
max_cqe: 4194303
max_mr: 524272
max_pd: 32764
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 2096000
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 248
max_total_mcast_qp_attach: 2031616
max_ah: 2147483647
max_fmr: 0
max_srq: 65472
max_srq_wr: 16383
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 15
general_odp_caps:
rc_odp_caps:
NO SUPPORT
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
NO SUPPORT
completion timestamp_mask: 0x0000ffffffffffff
hca_core_clock: 251000kHZ
device_cap_flags_ex: 0x47C9C76
tso_caps:
max_tso: 0
rss_caps:
max_rwq_indirection_tables: 0
max_rwq_indirection_table_size: 0
rx_hash_function: 0x0
rx_hash_fields_mask: 0x0
max_wq_type_rq: 0
packet_pacing_caps:
qp_rate_limit_min: 0kbps
qp_rate_limit_max: 0kbps
tag matching not supported
cq moderation caps:
max_cq_count: 65535
max_cq_period: 65535 us
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 2
port_lid: 71
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x02510868
port_cap_flags2: 0x0000
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 128
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 10.0 Gbps (4)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c903:000f:2ae3
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x02510868
port_cap_flags2: 0x0000
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 128
subnet_timeout: 0
init_type_reply: 0
active_width: 4X (2)
active_speed: 2.5 Gbps (1)
phys_state: POLLING (2)
GID[ 0]: fe80:0000:0000:0000:0002:c903:000f:2ae4
Additional information (depending on the issue)
- OpenMPI version: 4.1.4
- Output of
ucx_info -d
to show transports and devices recognized by UCX
#
# Memory domain: self
# component: self
# register: unlimited, cost: 0 nsec
# remote key: 8 bytes
#
# Transport: self
#
# Device: self
#
# capabilities:
# bandwidth: 6911.00 MB/sec
# latency: 0 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 8k
# am_bcopy: <= 8k
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# priority: 0
# device address: 0 bytes
# iface address: 8 bytes
# error handling: none
#
#
# Memory domain: tcp
# component: tcp
#
# Transport: tcp
#
# Device: eth0
#
# capabilities:
# bandwidth: 113.16 MB/sec
# latency: 5776 nsec
# overhead: 50000 nsec
# am_bcopy: <= 8k
# connection: to iface
# priority: 1
# device address: 4 bytes
# iface address: 2 bytes
# error handling: none
#
# Device: ib0
#
# capabilities:
# bandwidth: 4758.20 MB/sec
# latency: 5214 nsec
# overhead: 50000 nsec
# am_bcopy: <= 8k
# connection: to iface
# priority: 1
# device address: 4 bytes
# iface address: 2 bytes
# error handling: none
#
#
# Memory domain: ib/mlx4_0
# component: ib
# register: unlimited, cost: 90 nsec
# remote key: 16 bytes
# local memory handle is required for zcopy
#
# Transport: rc
#
# Device: mlx4_0:1
#
# capabilities:
# bandwidth: 3774.15 MB/sec
# latency: 1300 nsec + 1 * N
# overhead: 75 nsec
# put_short: <= 88
# put_bcopy: <= 8k
# put_zcopy: <= 1g, up to 6 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4k
# get_bcopy: <= 8k
# get_zcopy: 33..1g, up to 6 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4k
# am_short: <= 87
# am_bcopy: <= 8191
# am_zcopy: <= 8191, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4k
# am header: <= 127
# domain: device
# connection: to ep
# priority: 0
# device address: 3 bytes
# ep address: 4 bytes
# error handling: peer failure
#
#
# Transport: ud
#
# Device: mlx4_0:1
#
# capabilities:
# bandwidth: 3774.15 MB/sec
# latency: 1310 nsec
# overhead: 105 nsec
# am_short: <= 172
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 7 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4k
# am header: <= 172
# connection: to ep, to iface
# priority: 0
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure
#
#
# Memory domain: rdmacm
# component: rdmacm
# supports client-server connection establishment via sockaddr
# < no supported devices found >
#
# Memory domain: sysv
# component: sysv
# allocate: unlimited
# remote key: 32 bytes
#
# Transport: mm
#
# Device: sysv
#
# capabilities:
# bandwidth: 6911.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 92
# am_bcopy: <= 8k
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# priority: 0
# device address: 8 bytes
# iface address: 16 bytes
# error handling: none
#
#
# Memory domain: posix
# component: posix
# allocate: unlimited
# remote key: 37 bytes
#
# Transport: mm
#
# Device: posix
#
# capabilities:
# bandwidth: 6911.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 92
# am_bcopy: <= 8k
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# priority: 0
# device address: 8 bytes
# iface address: 16 bytes
# error handling: none
#
#
# Memory domain: cma
# component: cma
# register: unlimited, cost: 9 nsec
#
# Transport: cma
#
# Device: cma
#
# capabilities:
# bandwidth: 11145.00 MB/sec
# latency: 80 nsec
# overhead: 400 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# priority: 0
# device address: 8 bytes
# iface address: 4 bytes
# error handling: none
#