rdma icon indicating copy to clipboard operation
rdma copied to clipboard

`rdma_rxe` `ibv_post_send` soft lockup

Open Nugine opened this issue 3 years ago • 2 comments

Fixed by https://github.com/Nugine/rdma/compare/d2b3190aecc84d64d4616ae6a9f9f1b20ee2f052...cff469d8d3f0cf51490389666dbf7eae6188eca8. The real cause is in the kernel module rdma_rxe. I don't know why it happens.

To reproduce:

git checkout d2b3190aecc84d64d4616ae6a9f9f1b20ee2f052
for i in `seq 1 100`
do
    just bench-pingpong-rc
done

Nugine avatar May 30 '22 15:05 Nugine

OS: Ubuntu 20.04

$ uname -srv
Linux 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022
$ pkg-config --modversion libibverbs
1.14.41.0
$ pkg-config --modversion librdmacm
1.3.41.0

rdma-core version 3639589614c387669e0d66e0cdf956634a050bcc

Nugine avatar May 31 '22 01:05 Nugine

https://www.rdmamojo.com/2013/01/26/ibv_post_send/

If this is an RC QP, verify that the rnr_retry value that was configured in ibv_modify_qp() isn't 7 since this may lead to retry infinite time in case of RNR flow

Nugine avatar Jun 02 '22 13:06 Nugine