Coyote icon indicating copy to clipboard operation
Coyote copied to clipboard

Coyote v2 RDMA fails under certain benchmark

Open zhenhaohe opened this issue 1 year ago • 1 comments

I was testing the coyote v2 with rdma perf hw design and rdma services sw application.

  1. The RDMA read benchmark is unstable and fails under default amount of repetitions specified in the sw.The experiment below does not return.

./bin/test -d 0 -i 0 -t 10.1.212.177 -x 2048 Queue pair: Local : QPN 0x000000, PSN 0x22b267, VADDR 00007fe912200000, SIZE 00010000, IP 0x0afd4a60 Remote: QPN 0x000000, PSN 0x30c5c7, VADDR 00007feefbc00000, SIZE 00010000, IP 0x0afd4a5c Client registered Sent payload

RDMA BENCHMARK 1024 [bytes], thoughput: 19.94 [MB/s], latency: 33100.42 [ns] 2048 [bytes], thoughput: 2124.81 [MB/s], latency: 8167.80 [ns]

  1. The RDMA write benchmark does not scale beyond 4K message size:

./bin/test -d 0 -i 0 -t 10.1.212.175 -x 1024 -r 10 -l 10 -w 1 Queue pair: Local : QPN 0x000000, PSN 0x9bd652, VADDR 00007fbc23e00000, SIZE 00010000, IP 0x0afd4a58 Remote: QPN 0x000000, PSN 0xa03ec3, VADDR 00007fe9b5400000, SIZE 00010000, IP 0x0afd4a54 Client registered Sent payload

RDMA BENCHMARK 1024 [bytes], thoughput: 870.19 [MB/s], latency: 5824.05 [ns] 2048 [bytes], thoughput: 1976.83 [MB/s], latency: 6007.90 [ns] 4096 [bytes], thoughput: 3813.60 [MB/s], latency: 6559.50 [ns] ^Cterminate called after throwing an instance of 'std::runtime_error' what(): Stalled, SIGINT caught Aborted

zhenhaohe avatar Aug 06 '24 14:08 zhenhaohe

Yes, this is a known issue with RDMA at the moment. @maximilianheer is working on a fix that is hopefully coming soon.

JonasDann avatar Sep 16 '24 14:09 JonasDann

The 8K bug has been fixed in #86. The issues with reads where due to Coyote background processed being terminated before finishing the test. @maximilianheer has debugged this and created a branch; the changes will be part of a larger SW clean-up.

Closing this issue now.

bo3z avatar Nov 27 '24 08:11 bo3z