ucx icon indicating copy to clipboard operation
ucx copied to clipboard

no matches to transport list in OpenMPI 4.1.4

Open gregfi opened this issue 2 years ago • 0 comments

Describe the bug

I can successfully test rc and ud transports on mlx4_0:1 using ucx_perftest. However, OpenMPI reports:

[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 self/self: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 rc/mlx4_0:1: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 ud/mlx4_0:1: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 mm/sysv: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 mm/posix: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:333 cma/cma: did not match transport list
[bl3404:06734] ../../../../../openmpi-4.1.4/opal/mca/common/ucx/common_ucx.c:337 support level is none

Steps to Reproduce

  • Command line mpirun --mca pml ucx --mca btl ^uct -x UCX_NET_DEVICES=mlx4_0:1 -x UCX_TLS=all --mca pml_base_verbose 100 --mca btl_base_verbose 100 -mca pml_ucx_verbose 100 $CODE_NAME
  • UCX version used
# UCT version=1.4.0 revision 0000000
# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-static --without-avx --docdir=/usr/share/doc/packages/openucx

Setup and versions

  • OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
    • SLES12-SP5; x86_64
  • For RDMA/IB/RoCE related issues:
    • Driver version: rdma-core-22.5-4.3.1.x86_64; libibverbs1-22.5-4.3.1.x86_64
    • HW information from ibv_devinfo -vv command
transport:			InfiniBand (0)
fw_ver:				2.9.1530
node_guid:			0002:c903:000f:2ae2
sys_image_guid:			0002:c903:000f:2ae5
vendor_id:			0x02c9
vendor_part_id:			26428
hw_ver:				0xB0
board_id:			HP_0160000009
phys_port_cnt:			2
max_mr_size:			0xffffffffffffffff
page_size_cap:			0xfffffe00
max_qp:				131000
max_qp_wr:			16351
device_cap_flags:		0x047c9c76
				BAD_PKEY_CNTR
				BAD_QKEY_CNTR
				AUTO_PATH_MIG
				CHANGE_PHY_PORT
				UD_AV_PORT_ENFORCE
				PORT_ACTIVE_EVENT
				SYS_IMAGE_GUID
				RC_RNR_NAK_GEN
				UD_IP_CSUM
				XRC
				MEM_MGT_EXTENSIONS
				RAW_IP_CSUM
				Unknown flags: 0x488000
max_sge:			32
max_sge_rd:			30
max_cq:				65408
max_cqe:			4194303
max_mr:				524272
max_pd:				32764
max_qp_rd_atom:			16
max_ee_rd_atom:			0
max_res_rd_atom:		2096000
max_qp_init_rd_atom:		128
max_ee_init_rd_atom:		0
atomic_cap:			ATOMIC_HCA (1)
max_ee:				0
max_rdd:			0
max_mw:				0
max_raw_ipv6_qp:		0
max_raw_ethy_qp:		0
max_mcast_grp:			8192
max_mcast_qp_attach:		248
max_total_mcast_qp_attach:	2031616
max_ah:				2147483647
max_fmr:			0
max_srq:			65472
max_srq_wr:			16383
max_srq_sge:			31
max_pkeys:			128
local_ca_ack_delay:		15
general_odp_caps:
rc_odp_caps:
				NO SUPPORT
uc_odp_caps:
				NO SUPPORT
ud_odp_caps:
				NO SUPPORT
completion timestamp_mask:			0x0000ffffffffffff
hca_core_clock:			251000kHZ
device_cap_flags_ex:		0x47C9C76
tso_caps:
max_tso:			0
rss_caps:
	max_rwq_indirection_tables:			0
	max_rwq_indirection_table_size:			0
	rx_hash_function:				0x0
	rx_hash_fields_mask:				0x0
max_wq_type_rq:			0
packet_pacing_caps:
	qp_rate_limit_min:	0kbps
	qp_rate_limit_max:	0kbps
tag matching not supported

cq moderation caps:
	max_cq_count:	65535
	max_cq_period:	65535 us

	port:	1
		state:			PORT_ACTIVE (4)
		max_mtu:		4096 (5)
		active_mtu:		4096 (5)
		sm_lid:			2
		port_lid:		71
		port_lmc:		0x00
		link_layer:		InfiniBand
		max_msg_sz:		0x40000000
		port_cap_flags:		0x02510868
		port_cap_flags2:	0x0000
		max_vl_num:		4 (3)
		bad_pkey_cntr:		0x0
		qkey_viol_cntr:		0x0
		sm_sl:			0
		pkey_tbl_len:		128
		gid_tbl_len:		128
		subnet_timeout:		18
		init_type_reply:	0
		active_width:		4X (2)
		active_speed:		10.0 Gbps (4)
		phys_state:		LINK_UP (5)
		GID[  0]:		fe80:0000:0000:0000:0002:c903:000f:2ae3

	port:	2
		state:			PORT_DOWN (1)
		max_mtu:		4096 (5)
		active_mtu:		4096 (5)
		sm_lid:			0
		port_lid:		0
		port_lmc:		0x00
		link_layer:		InfiniBand
		max_msg_sz:		0x40000000
		port_cap_flags:		0x02510868
		port_cap_flags2:	0x0000
		max_vl_num:		4 (3)
		bad_pkey_cntr:		0x0
		qkey_viol_cntr:		0x0
		sm_sl:			0
		pkey_tbl_len:		128
		gid_tbl_len:		128
		subnet_timeout:		0
		init_type_reply:	0
		active_width:		4X (2)
		active_speed:		2.5 Gbps (1)
		phys_state:		POLLING (2)
		GID[  0]:		fe80:0000:0000:0000:0002:c903:000f:2ae4

Additional information (depending on the issue)

  • OpenMPI version: 4.1.4
  • Output of ucx_info -d to show transports and devices recognized by UCX
#
# Memory domain: self
#            component: self
#             register: unlimited, cost: 0 nsec
#           remote key: 8 bytes
#
#   Transport: self
#
#   Device: self
#
#      capabilities:
#            bandwidth: 6911.00 MB/sec
#              latency: 0 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 8k
#             am_bcopy: <= 8k
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#             priority: 0
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: tcp
#            component: tcp
#
#   Transport: tcp
#
#   Device: eth0
#
#      capabilities:
#            bandwidth: 113.16 MB/sec
#              latency: 5776 nsec
#             overhead: 50000 nsec
#             am_bcopy: <= 8k
#           connection: to iface
#             priority: 1
#       device address: 4 bytes
#        iface address: 2 bytes
#       error handling: none
#
#   Device: ib0
#
#      capabilities:
#            bandwidth: 4758.20 MB/sec
#              latency: 5214 nsec
#             overhead: 50000 nsec
#             am_bcopy: <= 8k
#           connection: to iface
#             priority: 1
#       device address: 4 bytes
#        iface address: 2 bytes
#       error handling: none
#
#
# Memory domain: ib/mlx4_0
#            component: ib
#             register: unlimited, cost: 90 nsec
#           remote key: 16 bytes
#           local memory handle is required for zcopy
#
#   Transport: rc
#
#   Device: mlx4_0:1
#
#      capabilities:
#            bandwidth: 3774.15 MB/sec
#              latency: 1300 nsec + 1 * N
#             overhead: 75 nsec
#            put_short: <= 88
#            put_bcopy: <= 8k
#            put_zcopy: <= 1g, up to 6 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4k
#            get_bcopy: <= 8k
#            get_zcopy: 33..1g, up to 6 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4k
#             am_short: <= 87
#             am_bcopy: <= 8191
#             am_zcopy: <= 8191, up to 5 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4k
#            am header: <= 127
#               domain: device
#           connection: to ep
#             priority: 0
#       device address: 3 bytes
#           ep address: 4 bytes
#       error handling: peer failure
#
#
#   Transport: ud
#
#   Device: mlx4_0:1
#
#      capabilities:
#            bandwidth: 3774.15 MB/sec
#              latency: 1310 nsec
#             overhead: 105 nsec
#             am_short: <= 172
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4k
#            am header: <= 172
#           connection: to ep, to iface
#             priority: 0
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
# Memory domain: rdmacm
#            component: rdmacm
#           supports client-server connection establishment via sockaddr
#   < no supported devices found >
#
# Memory domain: sysv
#            component: sysv
#             allocate: unlimited
#           remote key: 32 bytes
#
#   Transport: mm
#
#   Device: sysv
#
#      capabilities:
#            bandwidth: 6911.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 92
#             am_bcopy: <= 8k
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#             priority: 0
#       device address: 8 bytes
#        iface address: 16 bytes
#       error handling: none
#
#
# Memory domain: posix
#            component: posix
#             allocate: unlimited
#           remote key: 37 bytes
#
#   Transport: mm
#
#   Device: posix
#
#      capabilities:
#            bandwidth: 6911.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 92
#             am_bcopy: <= 8k
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#             priority: 0
#       device address: 8 bytes
#        iface address: 16 bytes
#       error handling: none
#
#
# Memory domain: cma
#            component: cma
#             register: unlimited, cost: 9 nsec
#
#   Transport: cma
#
#   Device: cma
#
#      capabilities:
#            bandwidth: 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 400 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#             priority: 0
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: none
#

gregfi avatar Sep 20 '22 17:09 gregfi