netlink Fix recvfrom goroutine leak

unix.Recvfrom does not return after fd close.

Fixes #792

Jul 31 '22 11:07 nekohasekai

About why old TestIfSocketCloses passed:

SetReceiveTimeout make Receive() always return unix.EAGAIN

Jul 31 '22 12:07 nekohasekai

Looks like this one may have introduced a regression; see https://github.com/moby/moby/pull/46982#issuecomment-2315840500

EINTR on netlink sockets is a new one. I suspect it has more to do with the netlink dependency bump you pulled in when rebasing than on the Go toolchain bump. I think the bug is here: https://github.com/vishvananda/netlink/blob/92645823f36c7ed03faf4baa566078d9d5e06fda/nl/nl_linux.go#L821-L824 It retries on EWOULDBLOCK (a.k.a. EAGAIN) but neglects to retry on EINTR.

Aug 28 '24 17:08 thaJeztah

@thaJeztah IO calls on non-blocking sockets will never return -EINTR. The problem here is that Moby calls SetSocketTimeout, which sets SO_SNDTIMEO and SO_RCVTIMEO. These socket options are only useful for sockets in blocking mode. Setting these probably places the socket back into blocking mode.

https://github.com/moby/moby/blob/92a05cf4148adc26054a5b52b191e0b4d13112cd/libnetwork/ns/init_linux.go#L25-L40

https://github.com/vishvananda/netlink/blob/92645823f36c7ed03faf4baa566078d9d5e06fda/nl/nl_linux.go#L848-L860

I'm not very familiar with this project, but it seems to me that before this PR, only blocking IO is used. This PR adds calls to set the socket as non-blocking, but still allows setting the timeout socket options.

Aug 29 '24 02:08 database64128

Thanks; let me check with the networking folks about those parts.

Aug 29 '24 09:08 thaJeztah

Tracing through the kernel sources, I can confidently claim that the timeout sockopts have no effect when the O_NONBLOCK fcntl is enabled. That's a red herring.

O_NONBLOCK on the socket forces the MSG_DONTWAIT flag to be set on all recv[from] syscalls
recv on a netlink socket is implemented in terms of skb_recv_datagram
The implementation of skb_recv_datagram only blocks if the timeout is non-zero

The timeout for a non-blocking socket would be zero due to the MSG_DONTWAIT flag being forced on; therefore setting the timeout sockopts must have no effect on the behavior of the recv syscall.

Need further proof? The timeouts on new sockets are initialized to LONG_MAX. And there is no magic hiding in the setsockopt syscall.

System calls that "shouldn't" return -EINTR have been known to return -EINTR in practice when a signal is delivered to the task while in a syscall. See e.g. https://go.dev/issue/38033. Maybe it's a kernel bug which results in a recv on a non-blocking socket returning with -EINTR upon receiving a signal despite the SA_RESTART sigaction when a blocking recv would have been implicitly restarted?

Aug 29 '24 22:08 corhere

This all still feels very strange to me.

I checked tokio's implementation of TcpStream and followed the references all the way down to the libc calls. There's no handling of -EINTR (std::io::ErrorKind::Interrupted) at all. The netlink-sys crate's tokio socket implementation does not handle -EINTR either.

I actually rolled my own rtnetlink Go package a while ago, and it never had to deal with -EINTR.

Aug 30 '24 14:08 database64128

Unlike Go programs, Rust programs don't (inherently) have a runtime which sends SIGURG to the process baked in.

Aug 30 '24 15:08 corhere

Unlike Go programs, Rust programs don't (inherently) have a runtime which sends SIGURG to the process baked in.

Tokio does allow you to register for receiving signals. It's actually quite similar to os/signal in terms of usage. I imagine there are programs that use Tokio for network IO and receiving signals at the same time.

Aug 30 '24 16:08 database64128

This change breaks the timeouts, I can raise a PR to fix that.

But, the EINTR we've been seeing in moby CI is unrelated - it's due to https://github.com/vishvananda/netlink/pull/925, exposing an NLM_F_DUMP_INTR (reported in the netlink response) as an EINTR ... https://github.com/vishvananda/netlink/blob/92645823f36c7ed03faf4baa566078d9d5e06fda/nl/nl_linux.go#L582-L584

Aug 30 '24 16:08 robmry

netlink netlink copied to clipboard

Fix recvfrom goroutine leak

netlink
netlink copied to clipboard