tokio icon indicating copy to clipboard operation
tokio copied to clipboard

io: make `EPOLLERR` awake `AsyncFd::readable`

Open BraulioVM opened this issue 3 years ago • 10 comments

Ref: https://github.com/tokio-rs/tokio/issues/4349

Motivation

Currently, if a user is awaiting on a AsyncFd becoming readable and we get a polling error (which may happen in some cases, see the referenced issue), the awaiter will never be woken. Ideally, they would be notified of this error in some way.

Solution

The solution here involves communicating to the Readiness object that there was an error, and when that happens, the Readiness future will return a synthetic error that boils down to "there was a polling error". If the user wants to find more information about the underlying error, they would have to find some other way (not clear how yet? as pointed out in the issue, for sockets one could use SO_ERROR).

Open questions

  1. How do we go about testing? My manual tests involved using the example piece of code in #4349 but that requires sudo which might not be great for automated testing. It's also kind of an involved setup. I will have to look into this.
  2. EPOLLERR also triggers is_write_closed in the mio event. I don't know if we should notify writers of an error or just of "write closed".
  3. More things that I've written down here and there in the source code

BraulioVM avatar Jan 29 '22 12:01 BraulioVM

I've implemented the functionality (epollerr awaking waiters) for the AsyncRead slot as well (also for the AsyncWrite one actually, but that one is tricky because ready.is_error() implies ready.is_write_closed(), which already waiters with a write interest). This is not part of the original issue but it makes sense to me to be consistent? We can do it in separate PRs if you prefer though.

My current plan is to find a satisfactory way to test these changes. It doesn't seem immediately obvious how to do it, because the only way I know to trigger a EPOLLERR is through a setup that generally requires root permissions (the one in the issue). Once I write some tests, I will think through the TODOs I've left in the code, and clean-up the code I've already written (which isn't great yet).

Please let me know if you have any thoughts

BraulioVM avatar Feb 01 '22 00:02 BraulioVM

I think I've found a way to test this automatically! By searching EPOLLERR on https://github.com/torvalds/linux (who would have guessed this would actually work) I've found https://github.com/torvalds/linux/blob/46f4945e2b39dda4b832909434785483e028419d/fs/eventfd.c#L175-L176 Indeed, the man page for eventfd (2) mentions:

If an overflow of the counter value was detected, then select(2) indicates the file descriptor as being both readable and writable, and poll(2) returns a POLLERR event.

So this looks very promising! I will try this over the following days

BraulioVM avatar Feb 01 '22 00:02 BraulioVM

So this looks very promising!

That was wrong. It's true that if the counter's value is 0xfffffffff then polling on the eventfd will return EPOLLERR, but it's very hard to write that value to the eventfd in the first place. According to the man page

As noted above, write(2) can never overflow the counter. However an overflow can occur if 2^64 eventfd "signal posts" were performed by the KAIO subsystem (theoretically possible, but practically unlikely)

I will search for other ways to trigger EPOLLER

BraulioVM avatar Feb 01 '22 18:02 BraulioVM

I've addressed a few of the TODOs that I had left in my changes, but the main problem with this PR remains: automated testing.

I have not found any way to reliably reproduce an EPOLLERR that doesn't require root (or CAP_SYS_ADMIN). I have checked in two example programs that can be run with sudo that can reproduce EPOLLER, just to show what kind of manual testing I'm doing. Even then, those programs do not cover all the changes that I've made.

Before investing more in these clunky manual tests, I would like to hear if you have any tips on how to test this. Some ideas that I have:

  1. Allow some tests to use sudo and have them run in a VM in CI that allows us to do this. If we had something like this, I would put more time in improving the tests I currently have (using fuse to trigger an epollerr) and make them more amenable to run in an automated way.
  2. Mocking. I think this would require a lot of very intrusive changes in the io module just to be able to simulate a mio giving us a is_error() event.
  3. Something else?

BraulioVM avatar Feb 05 '22 12:02 BraulioVM

Maybe I can add unit tests in the schedule_io module. I know unit tests are generally discouraged in this project but they could be worth it here. I'll have a look later

BraulioVM avatar Feb 05 '22 13:02 BraulioVM

If adding unit tests will allow you to test it, then that's fine.

Darksonn avatar Feb 06 '22 11:02 Darksonn

(this is still very much WIP)

I think I might be able to unit test part of the changes. I've written one such test in https://github.com/tokio-rs/tokio/pull/4444/files#diff-585bbdb06d4f023035fc2520bc464acb733f7f212921936f13c9873798cbfc92R602 , and will keep trying to add more tests over the following days

BraulioVM avatar Feb 07 '22 20:02 BraulioVM

Ok. Let me know when you need additional review.

Darksonn avatar Feb 07 '22 20:02 Darksonn

I'm running into the same issue. And I think an ErrorKind::Other is a bad way signal this, IMO a simple wake-up should be fine? (Perhaps with an is_error() check on the AsyncFd?). Alternatively, for some cases such as fuse, being able to wait explicitly for EPOLLERR would also be useful. In fact, other EPOLL* flags can also be of particular interest. Eg. to poll freezer cgroup.events you want to explicitly wait for EPOLLPRI while EPOLLIN is always true. Maybe we can expose a bit more of the io driver machinery so we can make custom registrations?

Blub avatar Jun 01 '22 08:06 Blub

I'm seeing the same issue. I've a readiness call on an network-over-USB device. If the USB device is removed, the kernel driver will wait for all handles/sockets to the device to be closed which will never happen because tokio is waiting for a future that will never be woken (SO_ERROR is set to ENETDOWN). In turn this locks the entire kernel USB subsystem, no USB devices can be add or removed until the socket is closed and the kernel resources freed to the kernerl.

Since there is no easy workaround, has there been any progress on this PR?

yorickdewid avatar Jul 14 '22 09:07 yorickdewid

doing some cross-linking here. In https://github.com/tokio-rs/tokio/pull/5566, Interest::PRIORITY and AsyncFd::ready have been added. On linux (and android, I guess) that makes it possible to wait for EPOLLPRI (cc @Blub)

I've now opened an additional PR that reports error readiness, see #5781. That means that with AsyncFd::ready you can check for read and/or error readiness, which I suspect provides a workaround for https://github.com/tokio-rs/tokio/issues/4349.

What it doesn't do yet is interact with the polling functions (e.g. poll_read_ready). My understanding is that those poll functions are useful when implementing custom Async{Read, Write} instances for system primitives?

folkertdev avatar Jun 10 '23 16:06 folkertdev

I'm going to close this as solved by #5781. But let me know on the issue if I've missed something.

Darksonn avatar Nov 25 '23 12:11 Darksonn