perftest
perftest copied to clipboard
ib_write_bw work normally but ib_write_bw -R failed
This is output of 'ib_write_bw -a -d mlx5_0 --report_gbits node1', seems to work fine:
[root@node3 bin]# ib_write_bw -a -d mlx5_0 --report_gbits node1
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x02 QPN 0x0032 PSN 0x5f2841 RKey 0x002440 VAddr 0x007f2e5f64b000
remote address: LID 0x01 QPN 0x003a PSN 0xc0fd7e RKey 0x002442 VAddr 0x007f443b2f8000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 5000 0.042350 0.042185 2.636533
4 5000 0.084507 0.084457 2.639276
8 5000 0.17 0.17 2.641399
16 5000 0.34 0.34 2.640952
32 5000 0.68 0.68 2.638957
64 5000 1.35 1.35 2.638629
128 5000 2.71 2.71 2.643606
256 5000 5.42 5.42 2.644112
512 5000 10.78 10.77 2.629186
1024 5000 21.38 21.37 2.608802
2048 5000 42.13 42.09 2.568967
4096 5000 83.97 83.91 2.560721
8192 5000 186.89 149.84 2.286319
16384 5000 195.18 169.98 1.296822
32768 5000 196.21 185.39 0.707209
65536 5000 196.25 190.26 0.362886
131072 5000 196.33 193.93 0.184945
262144 5000 195.49 195.03 0.092996
524288 5000 196.25 196.25 0.046789
1048576 5000 196.48 196.48 0.023422
2097152 5000 196.62 196.59 0.011718
4194304 5000 196.67 196.63 0.005860
8388608 5000 196.63 196.58 0.002929
---------------------------------------------------------------------------------------
But it would fail if I plus '-R', like:
[root@node3 bin]# ib_write_bw -a -d mlx5_0 --report_gbits -R node1
Received 10 times ADDR_ERROR
Unable to perform rdma_client function
Unable to init the socket connection
And I read source code and have known it's caused by RDMA_CM_EVENT_ADDR_ERROR, but I don't known why.
This is output about 'lscpi -vvv':
[root@node3 bin]# lspci -vvv | grep Mellanox -A 65
41:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies Device 0007
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 1125
NUMA node: 0
Region 0: Memory at 2807e000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at b4400000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [48] Vital Product Data
Product Name: ConnectX-6 VPI adapter card, HDR IB (200Gb/s) and 200GbE, single-port QSFP56
Read-only fields:
[PN] Part number: MCX653105A-HDAT
[EC] Engineering changes: AE
[V2] Vendor specific: MCX653105A-HDAT
[SN] Serial number: MT2130T07644
[V3] Vendor specific: 92a87ffbcbeaeb118000b8cef6f7f1c0
[VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX653105A
[V0] Vendor specific: PCIeGen4 x16
[VU] Vendor specific: MT2130T07644MLNXS0D0F0
[RV] Reserved: checksum good, 1 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 04, GenCap+ CGenEn+ ChkCap+ ChkEn+
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [1c0 v1] #19
Capabilities: [320 v1] #27
Capabilities: [370 v1] #26
Capabilities: [420 v1] #25
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
42:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
Subsystem: Intel Corporation Device 8008
Any clue about what happened? look forward to your reply, thanks!
Hi, please make sure to use the interface ip instead of host name when using -R option
Hi, please make sure to use the interface ip instead of host name when using -R option
Thanks!I tried to use the ip interface before your reply,but still failed. And your reply reminded me that I need to use the address of the ib network card but not tcp/ip... Thanks a lot for your reply! Wish you good health and every success!
Hi sjc2870, thanks! Wish you the same. does it still repro? did you solve the Issue?
This is output of 'ib_write_bw -a -d mlx5_0 --report_gbits node1', seems to work fine:
[root@node3 bin]# ib_write_bw -a -d mlx5_0 --report_gbits node1 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 128 CQ Moderation : 100 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x02 QPN 0x0032 PSN 0x5f2841 RKey 0x002440 VAddr 0x007f2e5f64b000 remote address: LID 0x01 QPN 0x003a PSN 0xc0fd7e RKey 0x002442 VAddr 0x007f443b2f8000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 2 5000 0.042350 0.042185 2.636533 4 5000 0.084507 0.084457 2.639276 8 5000 0.17 0.17 2.641399 16 5000 0.34 0.34 2.640952 32 5000 0.68 0.68 2.638957 64 5000 1.35 1.35 2.638629 128 5000 2.71 2.71 2.643606 256 5000 5.42 5.42 2.644112 512 5000 10.78 10.77 2.629186 1024 5000 21.38 21.37 2.608802 2048 5000 42.13 42.09 2.568967 4096 5000 83.97 83.91 2.560721 8192 5000 186.89 149.84 2.286319 16384 5000 195.18 169.98 1.296822 32768 5000 196.21 185.39 0.707209 65536 5000 196.25 190.26 0.362886 131072 5000 196.33 193.93 0.184945 262144 5000 195.49 195.03 0.092996 524288 5000 196.25 196.25 0.046789 1048576 5000 196.48 196.48 0.023422 2097152 5000 196.62 196.59 0.011718 4194304 5000 196.67 196.63 0.005860 8388608 5000 196.63 196.58 0.002929 ---------------------------------------------------------------------------------------
But it would fail if I plus '-R', like:
[root@node3 bin]# ib_write_bw -a -d mlx5_0 --report_gbits -R node1 Received 10 times ADDR_ERROR Unable to perform rdma_client function Unable to init the socket connection
And I read source code and have known it's caused by RDMA_CM_EVENT_ADDR_ERROR, but I don't known why.
This is output about 'lscpi -vvv':
[root@node3 bin]# lspci -vvv | grep Mellanox -A 65 41:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] Subsystem: Mellanox Technologies Device 0007 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 1125 NUMA node: 0 Region 0: Memory at 2807e000000 (64-bit, prefetchable) [size=32M] Expansion ROM at b4400000 [disabled] [size=1M] Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 <4us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 16GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [48] Vital Product Data Product Name: ConnectX-6 VPI adapter card, HDR IB (200Gb/s) and 200GbE, single-port QSFP56 Read-only fields: [PN] Part number: MCX653105A-HDAT [EC] Engineering changes: AE [V2] Vendor specific: MCX653105A-HDAT [SN] Serial number: MT2130T07644 [V3] Vendor specific: 92a87ffbcbeaeb118000b8cef6f7f1c0 [VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX653105A [V0] Vendor specific: PCIeGen4 x16 [VU] Vendor specific: MT2130T07644MLNXS0D0F0 [RV] Reserved: checksum good, 1 byte(s) reserved End Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 04, GenCap+ CGenEn+ ChkCap+ ChkEn+ Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [1c0 v1] #19 Capabilities: [320 v1] #27 Capabilities: [370 v1] #26 Capabilities: [420 v1] #25 Kernel driver in use: mlx5_core Kernel modules: mlx5_core 42:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Device 8008
Any clue about what happened? look forward to your reply, thanks!
I failed at this step, I don't know what happened “Failed to modify QP to RTS Unable to Connect the HCA's through the link”
This is output of 'ib_write_bw -a -d mlx5_0 --report_gbits node1', seems to work fine:
[root@node3 bin]# ib_write_bw -a -d mlx5_0 --report_gbits node1 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 128 CQ Moderation : 100 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x02 QPN 0x0032 PSN 0x5f2841 RKey 0x002440 VAddr 0x007f2e5f64b000 remote address: LID 0x01 QPN 0x003a PSN 0xc0fd7e RKey 0x002442 VAddr 0x007f443b2f8000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 2 5000 0.042350 0.042185 2.636533 4 5000 0.084507 0.084457 2.639276 8 5000 0.17 0.17 2.641399 16 5000 0.34 0.34 2.640952 32 5000 0.68 0.68 2.638957 64 5000 1.35 1.35 2.638629 128 5000 2.71 2.71 2.643606 256 5000 5.42 5.42 2.644112 512 5000 10.78 10.77 2.629186 1024 5000 21.38 21.37 2.608802 2048 5000 42.13 42.09 2.568967 4096 5000 83.97 83.91 2.560721 8192 5000 186.89 149.84 2.286319 16384 5000 195.18 169.98 1.296822 32768 5000 196.21 185.39 0.707209 65536 5000 196.25 190.26 0.362886 131072 5000 196.33 193.93 0.184945 262144 5000 195.49 195.03 0.092996 524288 5000 196.25 196.25 0.046789 1048576 5000 196.48 196.48 0.023422 2097152 5000 196.62 196.59 0.011718 4194304 5000 196.67 196.63 0.005860 8388608 5000 196.63 196.58 0.002929 ---------------------------------------------------------------------------------------
But it would fail if I plus '-R', like:
[root@node3 bin]# ib_write_bw -a -d mlx5_0 --report_gbits -R node1 Received 10 times ADDR_ERROR Unable to perform rdma_client function Unable to init the socket connection
And I read source code and have known it's caused by RDMA_CM_EVENT_ADDR_ERROR, but I don't known why. This is output about 'lscpi -vvv':
[root@node3 bin]# lspci -vvv | grep Mellanox -A 65 41:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] Subsystem: Mellanox Technologies Device 0007 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 1125 NUMA node: 0 Region 0: Memory at 2807e000000 (64-bit, prefetchable) [size=32M] Expansion ROM at b4400000 [disabled] [size=1M] Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 <4us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 16GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [48] Vital Product Data Product Name: ConnectX-6 VPI adapter card, HDR IB (200Gb/s) and 200GbE, single-port QSFP56 Read-only fields: [PN] Part number: MCX653105A-HDAT [EC] Engineering changes: AE [V2] Vendor specific: MCX653105A-HDAT [SN] Serial number: MT2130T07644 [V3] Vendor specific: 92a87ffbcbeaeb118000b8cef6f7f1c0 [VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX653105A [V0] Vendor specific: PCIeGen4 x16 [VU] Vendor specific: MT2130T07644MLNXS0D0F0 [RV] Reserved: checksum good, 1 byte(s) reserved End Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 04, GenCap+ CGenEn+ ChkCap+ ChkEn+ Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [1c0 v1] #19 Capabilities: [320 v1] #27 Capabilities: [370 v1] #26 Capabilities: [420 v1] #25 Kernel driver in use: mlx5_core Kernel modules: mlx5_core 42:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Device 8008
Any clue about what happened? look forward to your reply, thanks!
I failed at this step, I don't know what happened “Failed to modify QP to RTS Unable to Connect the HCA's through the link”
Please try to use the interface ip and not hostname when running rdmacm
I use the interface IP: server error: ethernet_read_keys: Couldn't read remote address Unable to read to socket/rdma_cm Failed to exchange data between server and clients client error: Failed to modify QP to RTS Unable to Connect the HCA's through the link
Can you please share the setup info, OS, cards etc.. so I can try to reproduce the issue?
sorry,my os is '6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC'。netcard is Intel Corporation Ethernet Connection X722.