rpma
rpma copied to clipboard
FEAT: rpma_cq_wait() performance optimization
FEAT: rpma_cq_wait() performance optimization
Rationale
ibv_ack_cq_events()
seems to be the main bottleneck for the librpma library when the completion event channel is used.
To avoid the problem ibv_ack_cq_events()
shall be called less frequently.
It is also wise to call it before ibv_get_cq_event()
as it is more possible that we still have some spare time before a new event will be ready to obtain via ibv_get_cq_event()
.
Description
The struct rpma_cq
shall be extended with a field unsigned int unack_cqe;
and set to 0
in rpma_cq_new()
.
(*cq_ptr)->cq = cq;
(*cq_ptr)->unack_cqe = 0;
unack_cqe
shall be increased every time ibv_get_cq_event
returns a valid event in rpma_cq_wait()
.
rpma_cq_wait(struct rpma_cq *cq)
{
...
if (ibv_get_cq_event(cq->channel, &ev_cq, &ev_ctx))
return RPMA_E_NO_COMPLETION;
++cq->unack_cqe;
As minimum the ibv_ack_cq_events()
shall be called before ibv_cq
is deleted inside rpma_cq_delet()
:
if (cq->unack_cqe)
(void) ibv_ack_cq_events(cq->cq, cq->unack_cqe);
errno = ibv_destroy_cq(cq->cq);
but it also must be called cyclically as part of rpma_cq_wait
(Please observe that ibv_ack_cq_events()
operation is moved before ibv_get_cq_event()
):
/*
* cq.c -- librpma completion-queue-related implementations
*/
...
#define RPMA_MAX_UNACK_CQE UINT_MAX
...
int
rpma_cq_wait(struct rpma_cq *cq)
{
...
/*
* ACK the collected CQ event.
*/
if (cq->unack_cqe >= RPMA_MAX_UNACK_CQE) {
ibv_ack_cq_events(cq->cq, cq->unack_cqe);
cq->unack_cqe = 0;
}
/* wait for the completion event */
struct ibv_cq *ev_cq; /* unused */
void *ev_ctx; /* unused */
if (ibv_get_cq_event(cq->channel, &ev_cq, &ev_ctx))
return RPMA_E_NO_COMPLETION;
...
}
...