Edo Frederix
Edo Frederix
UCX-1.8.0, and I also tried with the latest master. Thanks for the suggestion. I applied your patch. Doesn't change anything though, I get the same error. Not sure how to...
Very similar error with UCX-1.4.0 btw: ``` [qelr_create_cq:258]create cq: failed with rc = 22 [1588966463.076804] [hostname:2107996:0] ib_iface.c:472 UCX ERROR ibv_create_cq(cqe=4096) failed: Invalid argument ```
Same thing ``` # printenv | grep UCX UCX_ROOT=/software/ucx/1.8.0 UCX_DIR=/software/ucx/1.8.0 UCX_HOME=/software/ucx/1.8.0 UCX_UD_RX_INLINE=0 ``` is my UCX-related environment. Still I get: ``` [qelr_create_qp:679]create qp: failed on ibv_cmd_create_qp with 22 [1588969380.594241] [hostname:17141:0]...
``` hca_id: qedr0 transport: InfiniBand (0) fw_ver: 8.37.7.0 node_guid: f6e9:d4ff:fe61:b108 sys_image_guid: f6e9:d4ff:fe61:b108 vendor_id: 0x1077 vendor_part_id: 32880 hw_ver: 0x0 phys_port_cnt: 1 max_mr_size: 0x10000000000 page_size_cap: 0xfffff000 max_qp: 8568 max_qp_wr: 32767 device_cap_flags: 0x00209080...
FYI, the qedr0 device is connected using 25 Gbps via a router and the qedr1 using 10 Gbps directly to another similar second node on which I'm testing.
@yosefe thanks. Same error with that environment variable. In dmesg I do see: `hugetlbfs: ucx_info (4790): Using mlock ulimits for SHM_HUGETLB is deprecated` Not sure if that's related?
@yosefe no it turns to `inl:0`, i.e., ``` [qelr_create_qp:679]create qp: failed on ibv_cmd_create_qp with 22 [1589222241.869252] [vinci115:12717:0] ib_iface.c:623 UCX ERROR iface=0x1931bc0: failed to create UD QP TX wr:256 sge:2 inl:0...
@yosefe here's the output on both ends: ``` # ib_send_lat -c UD -d qedr1 -x 3 localhost [qelr_create_qp:679]create qp: failed on ibv_cmd_create_qp with 22 Unable to create QP. Failed to...
@yosefe, thanks a lot for your support so far. Assuming that this device indeed lacks UD support, would you expect OpenMPI performance to slow down or break, even without the...
No. Ended up ditching the QLogics in favor of ConnectX-4s. With those it's working very well.