quinn icon indicating copy to clipboard operation
quinn copied to clipboard

Connection stuck when `sendmsg` call completes with`EPIPE` on `Darwin`.

Open mstyura opened this issue 1 year ago • 14 comments

Steps to reproduce

  1. Use quinn to establish quic connection to remote server in context of iOS application via code like this (only UDP socket configuration provided):
let bind_addr = if addr.is_ipv4() {
    SocketAddr::new(IpAddr::V4(Ipv4Addr::UNSPECIFIED), 0)
} else {
    SocketAddr::new(IpAddr::V6(Ipv6Addr::UNSPECIFIED), 0)
};
let socket = tokio::net::UdpSocket::bind(bind_addr).await?;
let local_addr = socket.local_addr()?;
let runtime = quinn::TokioRuntime;
let endpoint = quinn::Endpoint::new(
    quinn_endpoint_config,
    None,
    socket.into_std()?,
    Arc::new(runtime),
)?;
  1. Minimize iOS application and lock the phone screen while previously established quic connection was alive;
  2. Wait several minutes in background (not sure about exact timing, maybe only important to app being actually suspended by iOS);
  3. Resume iOS application and try to use previous quic connection such that quinn send some packets.

Actual result

  1. quinn is unable to send any UDP packet over previously constructed UDP socket. It receives "Broken Pipe" error from sendmsg; Non of the IP packets leave the device;
  2. Idle timer on local side is reset before sendmsg is called and failed, prolonging automatic detection of socket broken;
  3. quinn connection is closed by local side once idle timeout is reached.

Expected result

  • There is some quick way to know that underlying socket is broken or whole quinn connection is not usable anymore.
  • Maybe there is an kind of event that UDP socket rotation is required, so user code can listen/wait to it and call Endpoint::rebind once problem with socket detected.
  • Maybe additional API from Connection is exposed, like explicit ping method which can return io::Error from underlying UDP socket and user code can trigger rebind or complete Endpoint/Connection.
  • Idle timer preferably should not restart when sendmsg call failed.

More details

I've tracked down EPIPE error from sendmsg inside XNU kernel. The backtrace to likely origin from XNU source code:

It look like the socket state contains SOF_DEFUNCT or SS_CANTSENDMORE so it's practically become unusable.

According to FreeBSD documentation EPIPE is also indicator of "you can not send more data via provided socket"

See: https://man.freebsd.org/cgi/man.cgi?send(2)

[EPIPE] The socket is unable to send anymore data (SBS_CANTSENDMORE has been set on the socket). This typically means that the socket is not connected.

UPD: The same "broken" state of socket produces ENOTCONN error when called with recvmsg, the origin of it seems to be this check in function soreceive.

UPD 2: Seems ENOTCONN should also have some special treatment on Darwin https://github.com/libevent/libevent/pull/1031/files

mstyura avatar Oct 10 '24 14:10 mstyura