liburing icon indicating copy to clipboard operation
liburing copied to clipboard

Potential dangling pointer if io_uring_submit_and_wait_timeout() used with IORING_SETUP_SQPOLL

Open lewissbaker opened this issue 1 year ago • 0 comments

If using a kernel version prior to 5.11 and thus IORING_ENTER_EXT_ARG is not available, the io_uring_submit_and_wait_timeout() function falls back to submitting an OP_TIMEOUT operation to manage the timeout instead.

However, if you have enabled SQ-polling, then it's possible that this function might return without actually waiting for the OP_TIMEOUT SQE to have been processed by the kernel-thread if a completion-event is received before the timeout elapses.

As the OP_TIMEOUT SQE holds a pointer to the __kernel_timespec structure passed by the user to this function, if the function returns early and the caller has passed a pointer to __kernel_timespec value that was allocated on the stack, then it's possible that the __kernel_timespec memory will contain garbage by the time the kernel-thread processes the OP_TIMEOUT SQE and reads the timespec value.

I think in this case, the io_uring_submit_and_wait_timeout() function needs to wait until the kernel-thread has indicated it has consumed the OP_TIMEOUT SQE by advancing head beyond the SQ index. Otherwise, the caller cannot know when it is safe to free the __kernel_timespec structure that it passed to this function.

I think this could also potentially be an issue for non-SQPOLL uring instances that do not have the IORING_SETUP_SUBMIT_ALL flag set (which would be all pre-5.11 kernels, since this flag was only added in 5.18) in cases where the submission of an SQE fails. My understanding from the docs on this flag is that an earlier SQE submission failing will cause io_uring_enter() to return without submitting subsequent SQEs. And, as such a failure will post a CQE, it's possible that this could cause the io_uring_submit_and_wait_timeout() function to return before the enqueued OP_TIMEOUT operation is submitted, leaving it in the submission-queue with a pointer to a potentially-dangling __kernel_timespec object.

The __io_uring_get_cqe() function would need to continue calling io_uring_enter() to submit subsequent SQEs until the OP_TIMEOUT SQE has been processed.

lewissbaker avatar Jun 04 '24 02:06 lewissbaker