perftest
perftest copied to clipboard
SRD CQ notifications
Add support for SRD CQ notifications. The first patch fixes a bug where the same completion channel is used for both TX and RX and messes up the test.
Hi,
Can you please add more details what issue the first patch actually resolves?
Thanks
IIRC, the test uses the same completion channel and does:
- Send packet
- Remote side receives the packet and send another packet
- Receive packet from remote
And assumes that the next event is necessarily from the send completion, but it could theoretically be from the receive completion.
@firasj might remember the exact details.
The unidirectional BW test for the client (run_iter_bw) is not supposed to receive data at all, so an event should never be delivered for the recv_cq, and the server side (run_iter_bw_server) isn't supposed to send any data, so an event should never be delivered for the send_cq.
Did you encounter any bug or issues when running the tests?
It was a real issue, some tests (send_lat for example) do both send and receive, and they might fail without this fix.
As @galpress has already mentioned, this is a real issue in latency tests that end up crashing the test.
We do not check if the completion is a send or receive. This may lead to assuming that we received a Tx completion when in fact it was an Rx completion, and in turn posting a Tx WQE without polling the TX CQ. Because the queue depth is 1 in latency tests, this results in an error (at least in EFA provider).
@HassanKhadour Can you please merge this PR or suggest some extra improvements? Thanks!
@firasj could you please provide the case that will reproduce this issue with the command you used?
@firasj can you please provide the command lines we were running? @sshaulnv what is needed to get this merged? Do you want me to test this on mlx5 hardware?
Hi Gal, yes we need the commands in order to test it on mlx5 hardware
@gal-pressman Can you please close this PR for now? It's been on a low priority and we'll reopen once we have a reproduction and an answer.
Hi, running:
for i in {1..20}; do ib_send_lat -c SRD -x 0 -F --report_gbits -a -n 10000 -r 1 -e & ib_send_lat -c SRD -x 0 -F --report_gbits -a -n 10000 -r 1 -e localhost; done
is reproducing the issue on EFA HW. @sshaulnv @gal-pressman can you please test this with mlx5 HW?
I ran the same command with mlx5 (UD instead of SRD), it works both with and without this patch, should be safe to merge.
@HassanKhadour Can you please merge this?
Hi @mrgolin , conflicts should be resolved to merge.
@HassanKhadour I've rebased and pushed as new PR https://github.com/linux-rdma/perftest/pull/211.