ucx
ucx copied to clipboard
ucx hangs when compiling with nvhpc and anything above -O1, assertion triggered
Describe the bug
When compiling ucx with NVHPC compilers v22.3, and using mlx5_0:1, ucx_perftest hangs, and when ^C
'ing, the server triggers an assertion. The exact same setup with GCC works fine. Also with --enable-compiler-opt=1
the NVHPC works fine, but not with --enable-compiler-opt=2
or above.
The assertion is:
[1650460671.381413] [nid020048:12945:0] perftest.c:129 UCX ERROR recv() failed: Connection reset by peer
[nid020048:12945:0:12945] perftest.c:273 Assertion `size <= max' failed
Steps to Reproduce
I've built ucx with Spack, using the following environment
Spack environment
spack:
concretization: separately
specs:
# gcc version
- osu-micro-benchmarks%gcc +cuda ^[email protected]:4 +cuda +cxx schedulers=slurm fabrics=ucx
^ucx +rdmacm +cma +verbs +xpmem +ib-hw-tm +mlx5-dv +dc +ud +rc +dm +optimizations
+gdrcopy ~assertions ~debug ^cuda@:11.0
# nvhpc version
- osu-micro-benchmarks%nvhpc +cuda ^[email protected]:4%nvhpc +cuda +cxx schedulers=slurm
fabrics=ucx ^ucx%nvhpc +rdmacm +cma +verbs +xpmem +ib-hw-tm +mlx5-dv +dc +ud +rc
+dm +optimizations +gdrcopy +assertions ~debug
view: false
config:
install_tree:
root: /spack
packages:
openssl:
variants: [certs=mozilla]
libtool:
externals:
- spec: [email protected]
prefix: /usr
m4:
externals:
- spec: [email protected]
prefix: /usr
autoconf:
externals:
- spec: [email protected]
prefix: /usr
automake:
externals:
- spec: [email protected]
prefix: /usr
perl:
externals:
- spec: [email protected]~cpanm+shared+threads
prefix: /usr
slurm:
externals:
- spec: slurm@20-11-8-1
prefix: /usr
rdma-core:
externals:
- spec: [email protected]
prefix: /usr
xpmem:
externals:
- spec: [email protected]
prefix: /opt/cray/xpmem/2.2.40-2.1_2.56__g3cf3325.shasta
compilers:
- compiler:
spec: [email protected]
paths:
cc: /spack/linux-sles15-zen/gcc-7.5.0/gcc-9.4.0-fl2gp6kxlqfoydt3jogtr5pcus5loyx7/bin/gcc
cxx: /spack/linux-sles15-zen/gcc-7.5.0/gcc-9.4.0-fl2gp6kxlqfoydt3jogtr5pcus5loyx7/bin/g++
f77: /spack/linux-sles15-zen/gcc-7.5.0/gcc-9.4.0-fl2gp6kxlqfoydt3jogtr5pcus5loyx7/bin/gfortran
fc: /spack/linux-sles15-zen/gcc-7.5.0/gcc-9.4.0-fl2gp6kxlqfoydt3jogtr5pcus5loyx7/bin/gfortran
flags: {}
operating_system: sles15
target: x86_64
modules: []
environment: {}
extra_rpaths: []
- compiler:
spec: [email protected]
paths:
cc: /spack/linux-sles15-zen2/gcc-9.4.0/nvhpc-22.3-m43i2j7uke6pwzxfkoytue7gordmtatg/Linux_x86_64/22.3/compilers/bin/nvc
cxx: /spack/linux-sles15-zen2/gcc-9.4.0/nvhpc-22.3-m43i2j7uke6pwzxfkoytue7gordmtatg/Linux_x86_64/22.3/compilers/bin/nvc++
f77: /spack/linux-sles15-zen2/gcc-9.4.0/nvhpc-22.3-m43i2j7uke6pwzxfkoytue7gordmtatg/Linux_x86_64/22.3/compilers/bin/nvfortran
fc: /spack/linux-sles15-zen2/gcc-9.4.0/nvhpc-22.3-m43i2j7uke6pwzxfkoytue7gordmtatg/Linux_x86_64/22.3/compilers/bin/nvfortran
flags: {}
operating_system: sles15
target: x86_64
modules: []
environment: {}
extra_rpaths: []
Concretized environment:
Input spec
--------------------------------
osu-micro-benchmarks%nvhpc+cuda
^[email protected]:4%nvhpc+cuda+cxx fabrics=ucx schedulers=slurm
^ucx%nvhpc+assertions+cma+dc~debug+dm+gdrcopy+ib-hw-tm+mlx5-dv+optimizations+rc+rdmacm+ud+verbs+xpmem
Concretized
--------------------------------
[email protected]%[email protected]+cuda arch=linux-sles15-zen2
^[email protected]%[email protected]~allow-unsupported-compilers~dev arch=linux-sles15-zen2
^[email protected]%[email protected]~python patches=05ff238,10a88ad arch=linux-sles15-zen2
^[email protected]%[email protected] libs=shared,static arch=linux-sles15-zen2
^[email protected]%[email protected] arch=linux-sles15-zen2
^[email protected]%[email protected]~pic libs=shared,static arch=linux-sles15-zen2
^[email protected]%[email protected]+optimize+pic+shared patches=0d38234 arch=linux-sles15-zen2
^[email protected]%[email protected]~atomics+cuda+cxx~cxx_exceptions~gpfs~internal-hwloc~java~legacylaunchers~lustre~memchecker+pmi~pmix+romio~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath cuda_arch=none fabrics=ucx patches=fba0d3a schedulers=slurm arch=linux-sles15-zen2
^[email protected]%[email protected]~cairo+cuda~gl~libudev+libxml2~netloc~nvml~opencl+pci~rocm+shared arch=linux-sles15-zen2
^[email protected]%[email protected] patches=6e08dc4 arch=linux-sles15-zen2
^[email protected]%[email protected] arch=linux-sles15-zen2
^[email protected]%[email protected] arch=linux-sles15-zen2
^[email protected]%[email protected]~symlinks+termlib abi=none patches=933af9e arch=linux-sles15-zen2
^[email protected]%[email protected]+openssl arch=linux-sles15-zen2
^[email protected]%[email protected]~docs~shared certs=mozilla arch=linux-sles15-zen2
^ca-certificates-mozilla@2022-03-29%[email protected] arch=linux-sles15-zen2
^[email protected]%[email protected]~cpanm+shared+threads patches=0eac10e,8cf4302 arch=linux-sles15-zen2
^[email protected]%[email protected] patches=4e1d78c,62fc8a8,ff37630 arch=linux-sles15-zen2
^[email protected]%[email protected] patches=7793209 arch=linux-sles15-zen2
^[email protected]%[email protected] arch=linux-sles15-zen2
^[email protected]%[email protected]+sigsegv patches=3877ab5,5746cf5,fc9b616 arch=linux-sles15-zen2
^[email protected]%[email protected] arch=linux-sles15-zen2
^[email protected]%[email protected] arch=linux-sles15-zen2
^slurm@20-11-8-1%[email protected]~gtk~hdf5~hwloc~mariadb~pmix+readline~restd sysconfdir=PREFIX/etc arch=linux-sles15-zen2
^[email protected]%[email protected]+assertions~backtrace-detail+cma+cuda+dc~debug+dm+doc+examples+gdrcopy+ib-hw-tm~java~knem~logging+mlx5-dv+openmp+optimizations~parameter_checking+pic+rc+rdmacm~rocm+shared~static+thread_multiple~ucg+ud+verbs+xpmem cuda_arch=none opt=3 simd=auto arch=linux-sles15-zen2
^[email protected]%[email protected] patches=c5efec1 arch=linux-sles15-zen2
^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-sles15-zen2
^[email protected]%[email protected]+kernel-module arch=linux-sles15-zen2
-
spack -e [env] install
- server:
UCX_NET_DEVICES=mlx5_0:1 srun -N1 -n1 --pty /spack/linux-sles15-zen2/nvhpc-22.3/ucx-1.12.1-4n2fet6aun5ilyfy4rxt2c247e4rajku/bin/ucx_perftest
- client:
UCX_NET_DEVICES=mlx5_0:1 srun -N1 -n1 --pty /spack/linux-sles15-zen2/nvhpc-22.3/ucx-1.12.1-4n2fet6aun5ilyfy4rxt2c247e4rajku/bin/ucx_perftest nid020048 -t ucp_put_lat
- UCX configure flags:
/spack/linux-sles15-zen2/nvhpc-22.3/ucx-1.12.1-4n2fet6aun5ilyfy4rxt2c247e4rajku/bin/ucx_info -v
# UCT version=1.12.1 revision dc92435
# configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=/spack/linux-sles15-zen2/nvhpc-22.3/ucx-1.12.1-4n2fet6aun5ilyfy4rxt2c247e4rajku --enable-mt --enable-cma --disable-params-check --enable-optimizations --enable-compiler-opt=3 --enable-assertions --disable-logging --disable-backtrace-detail --with-pic --with-rc --with-ud --with-dc --with-mlx5-dv --with-ib-hw-tm --with-dm --without-rocm --without-java --with-cuda=/apps/manali/UES/store/linux-sles15-zen2/nvhpc-22.3/cuda-11.6.2-a3layevrzuvgkl2anzqg3qpyrrevcrtz --with-gdrcopy=/spack/linux-sles15-zen2/nvhpc-22.3/gdrcopy-2.3-bm5yhjui33p3bpovoahya3f4dmzbuwh7 --without-knem --with-xpmem=/opt/cray/xpmem/2.2.40-2.1_2.56__g3cf3325.shasta --with-rdmacm=/usr --disable-static --enable-shared --disable-static --with-openmp --with-avx --with-verbs=/usr
Setup and versions
$ cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"
uname -a
Linux nid020000 5.3.18-24.75_10.0.189-cray_shasta_c #1 SMP Sun Sep 26 14:27:04 UTC 2021 (0388af5) x86_64 x86_64 x86_64 GNU/Linux
$ rpm -q libibverbs
libibverbs-51mlnx1-1.51258.060.x86_64
$ ofed_info -s
OFED-internal-5.1-2.5.8.0.60:
$ ibv_devinfo -vv
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.28.2006
node_guid: 0040:a684:abf3:0000
sys_image_guid: 0040:a684:abf3:0000
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: CRAY000000001
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xed721c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
RAW_IP_CSUM
MANAGED_FLOW_STEERING
Unknown flags: 0xC8400000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
xrc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
completion timestamp_mask: 0x7fffffffffffffff
hca_core_clock: 156250kHZ
raw packet caps:
C-VLAN stripping offload
Scatter FCS offload
IP csum offload
Delay drop
device_cap_flags_ex: 0x30000055ED721C36
RAW_SCATTER_FCS
PCI_WRITE_END_PADDING
Unknown flags: 0x3000004100000000
tso_caps:
max_tso: 262144
supported_qp:
SUPPORT_RAW_PACKET
rss_caps:
max_rwq_indirection_tables: 65536
max_rwq_indirection_table_size: 2048
rx_hash_function: 0x1
rx_hash_fields_mask: 0x800000FF
supported_qp:
SUPPORT_RAW_PACKET
max_wq_type_rq: 8388608
packet_pacing_caps:
qp_rate_limit_min: 1kbps
qp_rate_limit_max: 100000000kbps
supported_qp:
SUPPORT_RAW_PACKET
tag matching not supported
cq moderation caps:
max_cq_count: 65535
max_cq_period: 4095 us
maximum available device memory: 262144Bytes
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
max_msg_sz: 0x40000000
port_cap_flags: 0x04010000
port_cap_flags2: 0x0000
max_vl_num: invalid value (0)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 1
gid_tbl_len: 256
subnet_timeout: 0
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0000:00ff:fe00:30b3, RoCE v1
GID[ 1]: fe80::ff:fe00:30b3, RoCE v2
GID[ 2]: 0000:0000:0000:0000:0000:ffff:94bb:7454, RoCE v1
GID[ 3]: ::ffff:148.187.116.84, RoCE v2
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 16.28.2006
node_guid: 0040:a684:abe1:0000
sys_image_guid: 0040:a684:abe1:0000
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: CRAY000000001
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xed721c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
RAW_IP_CSUM
MANAGED_FLOW_STEERING
Unknown flags: 0xC8400000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
xrc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
completion timestamp_mask: 0x7fffffffffffffff
hca_core_clock: 156250kHZ
raw packet caps:
C-VLAN stripping offload
Scatter FCS offload
IP csum offload
Delay drop
device_cap_flags_ex: 0x30000055ED721C36
RAW_SCATTER_FCS
PCI_WRITE_END_PADDING
Unknown flags: 0x3000004100000000
tso_caps:
max_tso: 262144
supported_qp:
SUPPORT_RAW_PACKET
rss_caps:
max_rwq_indirection_tables: 65536
max_rwq_indirection_table_size: 2048
rx_hash_function: 0x1
rx_hash_fields_mask: 0x800000FF
supported_qp:
SUPPORT_RAW_PACKET
max_wq_type_rq: 8388608
packet_pacing_caps:
qp_rate_limit_min: 1kbps
qp_rate_limit_max: 100000000kbps
supported_qp:
SUPPORT_RAW_PACKET
tag matching not supported
cq moderation caps:
max_cq_count: 65535
max_cq_period: 4095 us
maximum available device memory: 262144Bytes
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
max_msg_sz: 0x40000000
port_cap_flags: 0x04010000
port_cap_flags2: 0x0000
max_vl_num: invalid value (0)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 1
gid_tbl_len: 256
subnet_timeout: 0
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0000:00ff:fe00:3033, RoCE v1
GID[ 1]: fe80::ff:fe00:3033, RoCE v2
GID[ 2]: 0000:0000:0000:0000:0000:ffff:94bb:7434, RoCE v1
GID[ 3]: ::ffff:148.187.116.52, RoCE v2
hca_id: mlx5_2
transport: InfiniBand (0)
fw_ver: 16.28.2006
node_guid: 0040:a684:abe2:0000
sys_image_guid: 0040:a684:abe2:0000
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: CRAY000000001
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xed721c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
RAW_IP_CSUM
MANAGED_FLOW_STEERING
Unknown flags: 0xC8400000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
xrc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
completion timestamp_mask: 0x7fffffffffffffff
hca_core_clock: 156250kHZ
raw packet caps:
C-VLAN stripping offload
Scatter FCS offload
IP csum offload
Delay drop
device_cap_flags_ex: 0x30000055ED721C36
RAW_SCATTER_FCS
PCI_WRITE_END_PADDING
Unknown flags: 0x3000004100000000
tso_caps:
max_tso: 262144
supported_qp:
SUPPORT_RAW_PACKET
rss_caps:
max_rwq_indirection_tables: 65536
max_rwq_indirection_table_size: 2048
rx_hash_function: 0x1
rx_hash_fields_mask: 0x800000FF
supported_qp:
SUPPORT_RAW_PACKET
max_wq_type_rq: 8388608
packet_pacing_caps:
qp_rate_limit_min: 1kbps
qp_rate_limit_max: 100000000kbps
supported_qp:
SUPPORT_RAW_PACKET
tag matching not supported
cq moderation caps:
max_cq_count: 65535
max_cq_period: 4095 us
maximum available device memory: 262144Bytes
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
max_msg_sz: 0x40000000
port_cap_flags: 0x04010000
port_cap_flags2: 0x0000
max_vl_num: invalid value (0)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 1
gid_tbl_len: 256
subnet_timeout: 0
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0000:00ff:fe00:3032, RoCE v1
GID[ 1]: fe80::ff:fe00:3032, RoCE v2
GID[ 2]: 0000:0000:0000:0000:0000:ffff:94bb:7433, RoCE v1
GID[ 3]: ::ffff:148.187.116.51, RoCE v2
hca_id: mlx5_3
transport: InfiniBand (0)
fw_ver: 16.28.2006
node_guid: 0040:a684:abf4:0000
sys_image_guid: 0040:a684:abf4:0000
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: CRAY000000001
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xed721c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
RAW_IP_CSUM
MANAGED_FLOW_STEERING
Unknown flags: 0xC8400000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
xrc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
completion timestamp_mask: 0x7fffffffffffffff
hca_core_clock: 156250kHZ
raw packet caps:
C-VLAN stripping offload
Scatter FCS offload
IP csum offload
Delay drop
device_cap_flags_ex: 0x30000055ED721C36
RAW_SCATTER_FCS
PCI_WRITE_END_PADDING
Unknown flags: 0x3000004100000000
tso_caps:
max_tso: 262144
supported_qp:
SUPPORT_RAW_PACKET
rss_caps:
max_rwq_indirection_tables: 65536
max_rwq_indirection_table_size: 2048
rx_hash_function: 0x1
rx_hash_fields_mask: 0x800000FF
supported_qp:
SUPPORT_RAW_PACKET
max_wq_type_rq: 8388608
packet_pacing_caps:
qp_rate_limit_min: 1kbps
qp_rate_limit_max: 100000000kbps
supported_qp:
SUPPORT_RAW_PACKET
tag matching not supported
cq moderation caps:
max_cq_count: 65535
max_cq_period: 4095 us
maximum available device memory: 262144Bytes
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
max_msg_sz: 0x40000000
port_cap_flags: 0x04010000
port_cap_flags2: 0x0000
max_vl_num: invalid value (0)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 1
gid_tbl_len: 256
subnet_timeout: 0
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0000:00ff:fe00:30b2, RoCE v1
GID[ 1]: fe80::ff:fe00:30b2, RoCE v2
GID[ 2]: 0000:0000:0000:0000:0000:ffff:94bb:7453, RoCE v1
GID[ 3]: ::ffff:148.187.116.83, RoCE v2
To complicate things, it does not always hang or trigger the assertion :(. Maybe undefined behavior / compiler bug?