liburing icon indicating copy to clipboard operation
liburing copied to clipboard

Request cancellation

Open hujianzhe opened this issue 2 years ago • 4 comments

I submit a socket connect request to IO uring,and after connect is complete, submit read and write requests,sqe->user_data are set to my socket_object。When a read request notification is received,it will be resubmitted。Write requests are handled in the same way. After running for a while,CTRL+C triggers the SIGINT signal processing function,which is used io_uring_prep_nop to let the program return from the io_uring_wait_cqe。 Then call io_uring_prep_cancel_fd (sqe, socket_object->fd, IORING_ASYNC_CANCEL_ALL) to cancel the io request on this socket。

At this time,call io_uring_for_each_cqe+io_uring_wait_cqe,the following problems appeared: 1、If I had previously used io_uring_prep_sendmsg_zc to submit a write request,the same user_data may appear multiple times, causing the resource to be released after it is accessed again(IORING_CQE_F_MORE+IORING_CQE_F_NOTIF(freed)+IORING_CQE_F_MORE(visit after free)) 2、If I had previously used io_uring_prep_sendmsg to submit a write request, the write request CQE might have been lost.

This problem occurs very frequently。

env: linux 6.4.2 liburing 2.4

hujianzhe avatar Oct 07 '23 16:10 hujianzhe

program exit clean code:

io_uring_for_each_cqe(&aio->__r, head, cqe) {
    advance_n++;
    // clear resource code.....
    if (advance_n >= peek_cnt) {
        break;
    }
}
io_uring_cq_advance(&aio->__r, advance_n);
if (advance_n > 0) {
    continue;
}
ret = io_uring_wait_cqe(&aio->__r, &cqe);
if (ret != 0) {
    /* ignore EINTR */
    return;
}
// clear resource code.....
io_uring_cqe_seen(&aio->__r, cqe);
continue

hujianzhe avatar Oct 07 '23 17:10 hujianzhe

Sorry for being late, apparently github hasn't been sending me notifications for a while... Would you happen to have a reproducer for this? Makes it a lot easier for me to debug / test, rather than have to spend the time to write my own.

axboe avatar Nov 20 '23 23:11 axboe

The code is a bit long,but I call io_uring_queue_init_params and use flags=IORING_SETUP_CLAMP | IORING_SETUP_SUBMIT_ALL |IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG,then the problem seems to be solved(Address Sanitizer run ok)

hujianzhe avatar Nov 22 '23 02:11 hujianzhe

After running for a while,CTRL+C triggers the SIGINT signal processing function,which is used io_uring_prep_nop to let the program return from the io_uring_wait_cqe。

Do you submit a nop from the signal handler? That doesn't sound safe if so

1、If I had previously used io_uring_prep_sendmsg_zc to submit a write request,the same user_data may appear multiple times, causing the resource to be released after it is accessed again(IORING_CQE_F_MORE+IORING_CQE_F_NOTIF(freed)+IORING_CQE_F_MORE(visit after free))

Can you elaborate what CQEs and how many do you get?

It should be 1 or more likely 2 CQEs with the same user_data: CQE1: flags=IORING_CQE_F_MORE; CQE2: flags=IORING_CQE_F_NOTIF;

The code is a bit long,but

That's fine, even though it'd be nice if you can shorten it. Without a way to reproduce a problem we can only be guessing what is the cause. Any reproducer greatly helps narrowing the problem down.

isilence avatar Jan 11 '24 14:01 isilence