rust-libp2p mDNS poll loop

Since the update from 0.41.1 to 0.42.2 there is some poll amplification or loop going on in mDNS: when I include the Mdns behaviour, CPU usage is above 1 full core for the most time, with sporadic brief pauses (<1sec each). Without Mdns behaviour CPU usage is zero, as it should be. What is going on is the following:

2022-03-25T13:28:01.627833Z TRACE polling: Poller::wait(_, Some(143.795167566s))    
2022-03-25T13:28:01.627858Z TRACE polling::epoll: wait: epoll_fd=36, timeout=Some(143.795167566s)    
2022-03-25T13:28:01.627884Z TRACE polling::epoll: modify: epoll_fd=36, fd=38, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.627888Z TRACE polling::epoll: modify: epoll_fd=36, fd=40, ev=Event { key: 2, readable: false, writable: true }    
2022-03-25T13:28:01.627932Z TRACE polling::epoll: new events: epoll_fd=36, res=1    
2022-03-25T13:28:01.627930Z TRACE polling::epoll: modify: epoll_fd=36, fd=42, ev=Event { key: 4, readable: false, writable: true }    
2022-03-25T13:28:01.627962Z TRACE polling::epoll: modify: epoll_fd=36, fd=37, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.627998Z TRACE async_io::reactor: react: 1 ready wakers    
2022-03-25T13:28:01.628021Z TRACE ipfs_embed::net::peers: AddressBook polled
2022-03-25T13:28:01.628026Z TRACE async_io::driver: main_loop: waiting on I/O    
2022-03-25T13:28:01.628065Z TRACE async_io::reactor: process_timers: 0 ready wakers    
2022-03-25T13:28:01.628084Z TRACE netlink_sys::smol: poll_recv_from: Err(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" })    
2022-03-25T13:28:01.628091Z TRACE polling: Poller::wait(_, Some(143.794912085s))    
2022-03-25T13:28:01.628128Z TRACE polling::epoll: wait: epoll_fd=36, timeout=Some(143.794912085s)    
2022-03-25T13:28:01.628154Z TRACE polling::epoll: modify: epoll_fd=36, fd=38, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.628182Z TRACE polling::epoll: new events: epoll_fd=36, res=1    
2022-03-25T13:28:01.628184Z TRACE polling::epoll: modify: epoll_fd=36, fd=40, ev=Event { key: 2, readable: false, writable: true }    
2022-03-25T13:28:01.628206Z TRACE polling::epoll: modify: epoll_fd=36, fd=37, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.628241Z TRACE async_io::reactor: react: 1 ready wakers    
2022-03-25T13:28:01.628269Z TRACE async_io::driver: main_loop: waiting on I/O    
2022-03-25T13:28:01.628267Z TRACE ipfs_embed::net::peers: AddressBook polled
2022-03-25T13:28:01.628302Z TRACE async_io::reactor: process_timers: 0 ready wakers    
2022-03-25T13:28:01.628327Z TRACE polling: Poller::wait(_, Some(143.794674307s))    
2022-03-25T13:28:01.628334Z TRACE netlink_sys::smol: poll_recv_from: Err(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" })    
2022-03-25T13:28:01.628352Z TRACE polling::epoll: wait: epoll_fd=36, timeout=Some(143.794674307s)    
2022-03-25T13:28:01.628378Z TRACE polling::epoll: modify: epoll_fd=36, fd=38, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.628405Z TRACE polling::epoll: new events: epoll_fd=36, res=1    
2022-03-25T13:28:01.628429Z TRACE polling::epoll: modify: epoll_fd=36, fd=37, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.628463Z TRACE async_io::reactor: react: 1 ready wakers    
2022-03-25T13:28:01.628475Z TRACE ipfs_embed::net::peers: AddressBook polled
2022-03-25T13:28:01.628491Z TRACE async_io::driver: main_loop: waiting on I/O    
2022-03-25T13:28:01.628521Z TRACE async_io::reactor: process_timers: 0 ready wakers    
2022-03-25T13:28:01.628539Z TRACE netlink_sys::smol: poll_recv_from: Err(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" })    
2022-03-25T13:28:01.628546Z TRACE polling: Poller::wait(_, Some(143.794454524s))    
2022-03-25T13:28:01.628580Z TRACE polling::epoll: wait: epoll_fd=36, timeout=Some(143.794454524s)    
2022-03-25T13:28:01.628606Z TRACE polling::epoll: modify: epoll_fd=36, fd=38, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T13:28:01.628640Z TRACE polling::epoll: modify: epoll_fd=36, fd=42, ev=Event { key: 4, readable: false, writable: true }    
2022-03-25T13:28:01.628680Z TRACE polling::epoll: new events: epoll_fd=36, res=1

At first I thought if-watch → rtnetlink was to blame, because that was emitting several DEBUG lines per loop (filtered out from the above listing), but that is just a symptom as I confirmed by debugging the if-watch example. The issue manifests itself in that async_io reactor always finds one waker to wake, which wakes the swarm containing the Mdns behaviour.

Any hints on what might cause this are greatly appreciated. I already put in logging for timers in Mdns, which would be in the above listing if timers were reset/fired/polled (there are no mDNS responses in my network, so no records to expire).

Mar 25 '22 13:03 rkuhn

addendum: lsof -nP output showing the relevant FDs

actyx   29873 rkuhn   35u  netlink                           0t0 877398060 ROUTE
actyx   29873 rkuhn   36u  a_inode               0,13          0      9494 [eventpoll]
actyx   29873 rkuhn   37u  a_inode               0,13          0      9494 [eventfd]
actyx   29873 rkuhn   38u  a_inode               0,13          0      9494 [timerfd]
actyx   29873 rkuhn   39u     IPv4          877389402        0t0       UDP *:5353 
actyx   29873 rkuhn   40u     IPv4          877389403        0t0       UDP 95.217.194.97:44860 
actyx   29873 rkuhn   41u     IPv4          877389404        0t0       UDP *:5353 
actyx   29873 rkuhn   42u     IPv4          877389405        0t0       UDP 172.17.0.1:45727 
actyx   29873 rkuhn   43u     IPv6          877398061        0t0       TCP *:4001 (LISTEN)
actyx   29873 rkuhn   44u     IPv4          877398062        0t0       TCP *:4001 (LISTEN)

Mar 25 '22 13:03 rkuhn

okay, here’s the really weird part: with this patch to async-io the poll loop gets broken after some time, settling down into its proper “wait for netlink messages” state

diff --git a/src/reactor.rs b/src/reactor.rs
index b6ae153..7e79478 100644
--- a/src/reactor.rs
+++ b/src/reactor.rs
@@ -382,9 +382,11 @@ impl Direction {
     /// Moves all wakers into a `Vec`.
     fn drain_into(&mut self, dst: &mut Vec<Waker>) {
         if let Some(w) = self.waker.take() {
+            log::trace!("waking poller");
             dst.push(w);
         }
         for (_, opt) in self.wakers.iter_mut() {
+            log::trace!("waking waiter");
             if let Some(w) = opt.take() {
                 dst.push(w);
             }

I then tried generating a netlink message by adding an IP address, which first did the right and expected thing and then spun a few dozen loops before going back to sleep again. So this looks like a race condition that when triggered leads to endless wakeups of the swarm.

Mar 25 '22 15:03 rkuhn

It looks like epoll is being interrupted by registering read interest, which immediately notifies, and waits, upon which this vicious cycle restarts.

2022-03-25T15:59:07.068412Z TRACE async_io::reactor: woke up after IO    
2022-03-25T15:59:07.068435Z TRACE async_io::reactor: waking poller    
2022-03-25T15:59:07.068460Z TRACE async_io::reactor: react: 1 ready wakers    
2022-03-25T15:59:07.068491Z TRACE async_io::driver: main_loop: waiting on I/O    
2022-03-25T15:59:07.068521Z TRACE async_io::reactor: process_timers: 0 ready wakers    
2022-03-25T15:59:07.068537Z DEBUG netlink_proto::connection: reading incoming messages    
2022-03-25T15:59:07.068545Z TRACE polling: Poller::wait(_, Some(297.113158634s))    
2022-03-25T15:59:07.068568Z TRACE polling::epoll: wait: epoll_fd=36, timeout=Some(297.113158634s)    
2022-03-25T15:59:07.068573Z DEBUG netlink_proto::codecs: NetlinkCodec: decoding next message    
2022-03-25T15:59:07.068591Z TRACE polling::epoll: modify: epoll_fd=36, fd=38, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T15:59:07.068611Z TRACE async_io::reactor: reusing existing waker 0    
2022-03-25T15:59:07.068639Z DEBUG netlink_proto::connection: forwarding unsolicited messages to the connection handle    
2022-03-25T15:59:07.068669Z DEBUG netlink_proto::connection: forwaring responses to previous requests to the connection handle    
2022-03-25T15:59:07.068699Z DEBUG netlink_proto::connection: handling requests    
2022-03-25T15:59:07.068726Z DEBUG netlink_proto::connection: sending messages    
2022-03-25T15:59:07.068769Z TRACE async_io::reactor: modify from poll    
2022-03-25T15:59:07.068794Z TRACE polling::epoll: modify: epoll_fd=36, fd=43, ev=Event { key: 5, readable: true, writable: false }    
2022-03-25T15:59:07.068823Z TRACE polling::epoll: new events: epoll_fd=36, res=1    
2022-03-25T15:59:07.068823Z TRACE async_io::reactor: reusing existing waker 0    
2022-03-25T15:59:07.068846Z TRACE polling::epoll: modify: epoll_fd=36, fd=37, ev=Event { key: 18446744073709551615, readable: true, writable: false }    
2022-03-25T15:59:07.068852Z TRACE async_io::reactor: reusing existing waker 0    
2022-03-25T15:59:07.068872Z TRACE async_io::reactor: woke up after IO

Mar 25 '22 16:03 rkuhn

This rabbit hole is deeper than I can follow right now:

epoll immediately returns readable
since no EPOLLET option is set, this indicates that the socket still has bytes, but those bytes are not consumed between epoll_wait calls (and we note the conspicuous absence of generated events in the logs above)
I have absolutely no clue where to look for the offending code

Mar 25 '22 16:03 rkuhn

Okay @mxinden I’m pretty sure I know how it happens, but I’m not sure how to fix it:

InterfaceState::poll registers read interest in the mDNS receiver socket (possibly after having drained received datagrams)
sometimes this works as intended, namely that epoll doesn’t return this socket as readable until something happens — I think this is the case when read interest is registered before the async-io run loop calls epoll_wait again
but most of the time, epoll_wait immediately returns, saying that the mDNS receiver socket is ready, but there are no messages there and the whole cycle begins again

If I remove self.recv_socket.poll_readable(cx) from the poll function in iface.rs, then everything works nicely, including mDNS discovery. I suppose the functionality will be rescued by one of the various timers that wake up the behaviour in any case, leading to poll calls. I’ll probably patch libp2p-mdns in this fashion so that Actyx works again, but what is the proper solution?

Mar 25 '22 19:03 rkuhn

Sleeping on it, I have doubts on the current design of libp2p-mdns: I don’t think the UDP ports should be polled every time something unrelated happens in the rest of the Swarm, and I also don’t think that the rest of the Swarm should be polled every time something happens on those UDP ports.

Mar 26 '22 05:03 rkuhn

Tagging this as important because it performs IO on the main task that all NetworkBehaviours run on.

Sep 12 '23 01:09 thomaseizinger