pdns icon indicating copy to clipboard operation
pdns copied to clipboard

dnsdist: Outgoing DoT connections using gnutls do not work on macOS and OpenBSD

Open omoerbeek opened this issue 2 months ago • 3 comments

On macOS and OpenBSD, when using the gnutls provider all outgoing DoT connections do not work.

The gnutls lib does not like that gnutls_handshake() is called on a non-blocking socket that is not yet in connected state. The call returns a fatal status: -53 (Error in the push function).

Corresponding trace on OpenBSD below. I suspect FreeBSD suffers in a similar way.

 94717 dnsdist  RET   socket 85/0x55
 94717 dnsdist  CALL  fcntl(85,F_GETFD)
 94717 dnsdist  RET   fcntl 0
 94717 dnsdist  CALL  fcntl(85,F_SETFD,FD_CLOEXEC)
 94717 dnsdist  RET   fcntl 0
 94717 dnsdist  CALL  fcntl(85,F_GETFL)
 94717 dnsdist  RET   fcntl 2
 94717 dnsdist  CALL  fcntl(85,F_SETFL,0x6<O_RDWR|O_NONBLOCK>)
 94717 dnsdist  RET   fcntl 0
 94717 dnsdist  CALL  connect(85,0xbaa5ec07160,16)
 94717 dnsdist  STRU  struct sockaddr { AF_INET, 9.9.9.9:853 }
 94717 dnsdist  RET   connect -1 errno 36 Operation now in progress
 94717 dnsdist  CALL  getpid()
 94717 dnsdist  RET   getpid 94717/0x171fd
 94717 dnsdist  CALL  getpid()
 94717 dnsdist  RET   getpid 94717/0x171fd
 94717 dnsdist  CALL  getpid()
 94717 dnsdist  RET   getpid 94717/0x171fd
 94717 dnsdist  CALL  getpid()
 94717 dnsdist  RET   getpid 94717/0x171fd
 94717 dnsdist  CALL  getpid()
 94717 dnsdist  RET   getpid 94717/0x171fd
 94717 dnsdist  CALL  sendmsg(85,0xbaa27bd8120,0x400<MSG_NOSIGNAL>)
 94717 dnsdist  STRU  struct msghdr { name=0x0, namelen=0, iov=0xbaa27bd8170, iovlen=1, control=0x0, controllen=0, flags=0 }
 94717 dnsdist  STRU  struct iovec { base=0xbaa2747e05b, len=389 }
 94717 dnsdist  RET   sendmsg -1 errno 57 Socket is not connected
 94717 dnsdist  CALL  sendmsg(85,0xbaa27bd8250,0x400<MSG_NOSIGNAL>)
 94717 dnsdist  STRU  struct msghdr { name=0x0, namelen=0, iov=0xbaa27bd82a0, iovlen=1, control=0x0, controllen=0, flags=0 }
 94717 dnsdist  STRU  struct iovec { base=0xbaa2747e05b, len=389 }
 94717 dnsdist  RET   sendmsg -1 errno 57 Socket is not connected
 94717 dnsdist  CALL  shutdown(85,SHUT_RDWR)
 94717 dnsdist  RET   shutdown 0
 94717 dnsdist  CALL  close(85)
 94717 dnsdist  RET   close 0

omoerbeek avatar Oct 02 '25 13:10 omoerbeek

https://gitlab.com/gnutls/gnutls/-/issues/117

rgacogne avatar Oct 02 '25 13:10 rgacogne

On FreeBSD test_OutgoingTLS.py passes, apparently.

rgacogne avatar Oct 10 '25 12:10 rgacogne

It's going to require significant changes:

1/ TCPIOHandler::tryConnect needs to report that the connection is still pending, which is easy:

diff --git pdns/tcpiohandler.hh pdns/tcpiohandler.hh
index 9450b6118..6b9ce6cca 100644
--- pdns/tcpiohandler.hh
+++ pdns/tcpiohandler.hh
@@ -313,12 +313,16 @@ public:
     }
     else {
       if (!s_disableConnectForUnitTests) {
-        SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0});
+        if (SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0}) == EINPROGRESS) {
+          return IOState::NeedWrite;
+        }
       }
     }
 #else
     if (!s_disableConnectForUnitTests) {
-      SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0});
+      if (SConnectWithTimeout(d_socket, remote, /* no timeout, we will handle it ourselves */ timeval{0,0}) == EINPROGRESS) {
+        return IOState::NeedWrite;
+      }
     }
 #endif /* MSG_FASTOPEN */

2/ We need to act on that in ConnectionToBackend::reconnect, likely by setting d_lastIOBlocked = true; to prevent any attempt at reading/writing immediately if this is not an initial connect but a reconnection

3/ registering the descriptor as NeedWrite in the multiplexer

4/ later when handleIO is called from any other place than the IO callback (this means queueQuery for DoT, probably something else for DoH) we need to be careful not to try to write or change our state.

5/ And of course we need to know what to do when the socket becomes writable, which might require a new state, or perhaps the existing code already does the right thing.

That's not terrible but the existing code is already in need of a refactoring, so perhaps I'll do that first.

rgacogne avatar Oct 10 '25 13:10 rgacogne