tokio-uring
tokio-uring copied to clipboard
potential errors from sys::io_uring_enter usually being ignored
tokio-uring gets submissions passed to the kernel with its calls to uring.submit() (ultimately getting sys::io_uring_enter called) which it does in two places.
The most common way to get submit called is to have the on_thread_park callback invoked when the current thread has no more active work to do. Under normal operation, the thread would have handled some uring results and loaded some new uring submissions.
https://github.com/tokio-rs/tokio-uring/blob/402653919b6c010468fdd94ce0a7ea3347de103a/src/runtime.rs#L53-L57
The other, less common mechanism for issuing a submit call, is for the application to create so many submissions while the current thread is continuously doing work that the submission queue becomes full and the tokio-uring driver is forced to submit what is there to make room for another submission.
https://github.com/tokio-rs/tokio-uring/blob/402653919b6c010468fdd94ce0a7ea3347de103a/src/driver/op.rs#L71-L74
But errors that the call to sys::io_uring_enter may return are ignored by the common route, the first one shown. And if the submission queue were filled up, the error returned that the caller does see, via the second route shown, aren't directly related to its submission, but rather indicate a problem with the uring or its use in general.
Here are the current documented io_uring_enter system call errors:
These are the errors returned by io_uring_enter() system call.
EAGAIN The kernel was unable to allocate memory for the request, or
otherwise ran out of resources to handle it. The application should
wait for some completions and try again.
EBADF fd is not a valid file descriptor.
EBADFD fd is a valid file descriptor, but the io_uring ring is not in the
right state (enabled). See io_uring_register(2) for details on how
to enable the ring.
EBADR At least one CQE was dropped even with the IORING_FEAT_NODROP
feature, and there are no otherwise available CQEs. This clears the
error state and so with no other changes the next call to
io_uring_setup(2) will not have this error. This error should be
extremely rare and indicates the machine is running critically low
on memory and. It may be reasonable for the application to
terminate running unless it is able to safely handle any CQE being
lost.
EBUSY If the IORING_FEAT_NODROP feature flag is set, then EBUSY will be
returned if there were overflow entries, IORING_ENTER_GETEVENTS
flag is set and not all of the overflow entries were able to be
flushed to the CQ ring.
Without IORING_FEAT_NODROP the application is attempting to
overcommit the number of requests it can have pending. The
application should wait for some completions and try again. May
occur if the application tries to queue more requests than we have
room for in the CQ ring, or if the application attempts to wait for
more events without having reaped the ones already present in the
CQ ring.
EINVAL Some bits in the flags argument are invalid.
EFAULT An invalid user space address was specified for the sig argument.
ENXIO The io_uring instance is in the process of being torn down.
EOPNOTSUPP
fd does not refer to an io_uring instance.
EINTR The operation was interrupted by a delivery of a signal before it
could complete; see signal(7). Can happen while waiting for events
with IORING_ENTER_GETEVENTS.
They relate to interesting reasons for a problem. Without any corrective action, at a minimum, the uring should be shut down and all further uring operations invalidated. At a maximum, the error should be reported and the current thread panicked.
In the async world of tokio-uring, what would a mechanism look like that didn't want to drop any errors from io_uring_enter and how would they be delivered back to the caller?
Should the default be to panic in the on_park_enter callback? Should the user be given the chance to register a callback for errors encountered by the on_park_enter callback? If so, should the same mechanism be built into the second submit route, so the two can be handled in a uniformed manor, freeing the caller of a singular submission from having any error handling?
I think we can try and handle some of these. EGAIN and EBUSY should be doable for instance. Most of these should just be panics though.
@Noah-Kennedy Sounds good. I'll work on that soon.
That begs the question whether submit_with in https://github.com/tokio-rs/tokio-uring/blob/913aa14c567cfae026b7d4f88a5155921cfb9575/src/driver/op.rs#L61-L91 should continue to return an io::Result.
An argument could be made that it future proofs it, even if there is no possible error returned right now, there might be in the future.
But I slightly favor changing the return type to simply Op<T>.
I don't have a feel for whether this would be considered a drastic change. The whole crate is still considered a POC I believe so breaking changes should still be acceptable.
Breaking changes are definitely acceptable at this stage if they move the API in a better direction.