io_uring_for_each_cqe can overflow the cqe ring with SQPOLL
A server using for_each_cqe with SQPOLL set will overflow the cqe ring given enough connections / time such that we keep getting new CQEs added onto the end before we complete the loop.
As peek/wait is significantly slower than this loop does anyone have a better method of cq polling in this case? Queue'ing the events and doing a second loop is a 5% slowdown.
io_uring_for_each_cqe(&uring, head, cqe) {
handle_read()
write_response()
}
io_uring_cq_advance(&uring, n);
This is the for_each macro. Perhaps saving the tail prior to the loop would be okay?
for (head = *(ring)->cq.khead; \
(cqe = (head != io_uring_smp_load_acquire((ring)->cq.ktail) ? \
&(ring)->cq.cqes[io_uring_cqe_index(ring, head, (ring)->cq.ring_mask)] : NULL)); \
head++)
I went with breaking out of the loop after N iterations.
I really liked your idea of using a tail assigned at the start of the loop though, I think that's a nice general thing to do and it would solve your issue as well.
Fixed in 2.8 and the current -git tree.
Thanks for all the work you've put in!