dnsdist: Outgoing DoT connections using gnutls do not work on macOS and OpenBSD
On macOS and OpenBSD, when using the gnutls provider all outgoing DoT connections do not work.
The gnutls lib does not like that gnutls_handshake() is called on a non-blocking socket that is not yet in connected state. The call returns a fatal status: -53 (Error in the push function).
Corresponding trace on OpenBSD below. I suspect FreeBSD suffers in a similar way.
94717 dnsdist RET socket 85/0x55
94717 dnsdist CALL fcntl(85,F_GETFD)
94717 dnsdist RET fcntl 0
94717 dnsdist CALL fcntl(85,F_SETFD,FD_CLOEXEC)
94717 dnsdist RET fcntl 0
94717 dnsdist CALL fcntl(85,F_GETFL)
94717 dnsdist RET fcntl 2
94717 dnsdist CALL fcntl(85,F_SETFL,0x6<O_RDWR|O_NONBLOCK>)
94717 dnsdist RET fcntl 0
94717 dnsdist CALL connect(85,0xbaa5ec07160,16)
94717 dnsdist STRU struct sockaddr { AF_INET, 9.9.9.9:853 }
94717 dnsdist RET connect -1 errno 36 Operation now in progress
94717 dnsdist CALL getpid()
94717 dnsdist RET getpid 94717/0x171fd
94717 dnsdist CALL getpid()
94717 dnsdist RET getpid 94717/0x171fd
94717 dnsdist CALL getpid()
94717 dnsdist RET getpid 94717/0x171fd
94717 dnsdist CALL getpid()
94717 dnsdist RET getpid 94717/0x171fd
94717 dnsdist CALL getpid()
94717 dnsdist RET getpid 94717/0x171fd
94717 dnsdist CALL sendmsg(85,0xbaa27bd8120,0x400<MSG_NOSIGNAL>)
94717 dnsdist STRU struct msghdr { name=0x0, namelen=0, iov=0xbaa27bd8170, iovlen=1, control=0x0, controllen=0, flags=0 }
94717 dnsdist STRU struct iovec { base=0xbaa2747e05b, len=389 }
94717 dnsdist RET sendmsg -1 errno 57 Socket is not connected
94717 dnsdist CALL sendmsg(85,0xbaa27bd8250,0x400<MSG_NOSIGNAL>)
94717 dnsdist STRU struct msghdr { name=0x0, namelen=0, iov=0xbaa27bd82a0, iovlen=1, control=0x0, controllen=0, flags=0 }
94717 dnsdist STRU struct iovec { base=0xbaa2747e05b, len=389 }
94717 dnsdist RET sendmsg -1 errno 57 Socket is not connected
94717 dnsdist CALL shutdown(85,SHUT_RDWR)
94717 dnsdist RET shutdown 0
94717 dnsdist CALL close(85)
94717 dnsdist RET close 0
https://gitlab.com/gnutls/gnutls/-/issues/117
On FreeBSD test_OutgoingTLS.py passes, apparently.
It's going to require significant changes:
1/ TCPIOHandler::tryConnect needs to report that the connection is still pending, which is easy:
diff --git pdns/tcpiohandler.hh pdns/tcpiohandler.hh
index 9450b6118..6b9ce6cca 100644
--- pdns/tcpiohandler.hh
+++ pdns/tcpiohandler.hh
@@ -313,12 +313,16 @@ public:
}
else {
if (!s_disableConnectForUnitTests) {
- SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0});
+ if (SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0}) == EINPROGRESS) {
+ return IOState::NeedWrite;
+ }
}
}
#else
if (!s_disableConnectForUnitTests) {
- SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0});
+ if (SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0}) == EINPROGRESS) {
+ return IOState::NeedWrite;
+ }
}
#endif /* MSG_FASTOPEN */
2/ We need to act on that in ConnectionToBackend::reconnect, likely by setting d_lastIOBlocked = true; to prevent any attempt at reading/writing immediately if this is not an initial connect but a reconnection
3/ registering the descriptor as NeedWrite in the multiplexer
4/ later when handleIO is called from any other place than the IO callback (this means queueQuery for DoT, probably something else for DoH) we need to be careful not to try to write or change our state.
5/ And of course we need to know what to do when the socket becomes writable, which might require a new state, or perhaps the existing code already does the right thing.
That's not terrible but the existing code is already in need of a refactoring, so perhaps I'll do that first.