AdGuardHome icon indicating copy to clipboard operation
AdGuardHome copied to clipboard

No local connection to a 2nd level DNS server on FreeBSD when binding to 0.0.0.0

Open netchild opened this issue 1 year ago • 11 comments

Prerequisites

Platform (OS and CPU architecture)

FreeBSD, AMD64 (aka x86_64)

Installation

Custom package (OpenWrt, HomeAssistant, etc; please mention in the description)

Setup

On one machine

AdGuard Home version

0.107.40

Action

Setup: 1 system with

  • adguardhome, port 53, installed as a FreeBSD package
  • unbound, port 55, for recursive lookups to DNS servers in the internet
  • ISC bind, port 54, for internal domains
  • Adguardhome is supposed to lookup to localhost port 54 for internal domains and use lookups to localhost port 55 for the other lookups (so 2 upstreams are on the same machine but different ports, one upstream is for local domains, the other for the rest)
  • resolv.conf on this system has "nameserver 127.0.0.1", I also tried with the IP of the external interface for the nameserver, same result
  • inside the system perform some nslookups

Issue: When adguardhome is bound to 0.0.0.0, it opens "tcp46" and "udp46" sockets (netstat -na output). In this setup any lookup via nslookup returns a timeout (e.g. nslookup www.google.com). When adguard is bound to 127.0.0.1, ::1, external IPv4 and external IPv6, it does not create xxx46 sockets, but tcp4 and tcp6 and udp4 and udp6 sockets (again, netstat -na output). In this setup any valid lookup via nslookup is immediately displayed.

When bound to 0.0.0.0: When I stop all 3 services, and then start all (adguardhome last), it doesn't work. The strange thing is, it just doesn't work when I do the nslookup on the box itself. If I do the same query from the outside, it works. If I then simply restart adguardhome, the nslookup starts to work.

When it fails, I get an error in the log that it can't resolve the name from the IP of the configured trusted proxy. [error] upstream 192.168.x.y:55 failed to exchange ;z.x.168.192.in-addr.arpa. IN PTR in 2.007481739s. Cause: exchanging with 192.168.x.y:55 over udp: read udp 192.168.x.y:52034->192.168.x.y:55: i/o timeout

To me this looks like it disabled the upstream (unbound on port 55) if it doesn't works on startup and doesn't try again. But it only doesn't work when queried with nslookup from localhost. If I query from outside the box, it works. I do exactly the same nslookup commands (shell history, via arrow up, no handwriting commands again) between a restart.

And if I do not bind to 0.0.0.0 but to the specific IPs, I can do whatever I want and it works.

Any ideas how to debug this further? Any specific debug options to print out some more info?

Expected result

nslookup returns a result

Actual result

nslookup www.stern.de
;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; no servers could be reached

Additional information and/or screenshots

ASCII art of the DNS flow: internal clients -> adguardhome port 53 (lookup www.google.com) -> unbound port 55 -> internet internal clients -> adguardhome port 53 (lookup internal domain) -> ISC bind port 54

netchild avatar Nov 07 '23 20:11 netchild

Hello and thank you for the thorough report. Does nslookup 'www.example.com' '::1' work? The main theory here is that the system doesn't process dual-stack sockets the way the Go runtime (and thus AGH) expects, and it may have something to do with the net.inet6.ip6.v6only value.

In any case, enabling verbose logging might clarify things.

ainar-g avatar Nov 08 '23 10:11 ainar-g

With config set to bind to 0.0.0.0:

Mittwoch, 08. November 2023, 21:03:32
{0} <FreeBSD 15.0-CURRENT> [dns:~]
(206) root@ttypts/5 # nslookup 'www.example.com' '::1'
Server:         ::1
Address:        ::1#53

Non-authoritative answer:
Name:   www.example.com
Address: 93.184.216.34
Name:   www.example.com
Address: 2606:2800:220:1:248:1893:25c8:1946


Mittwoch, 08. November 2023, 21:03:51
{0} <FreeBSD 15.0-CURRENT> [dns:~]
(207) root@ttypts/5 # nslookup 'www.example.com'
;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; no servers could be reached



Mittwoch, 08. November 2023, 21:04:16
{1} <FreeBSD 15.0-CURRENT> [dns:~]
(208) root@ttypts/5 # nslookup 'www.example.com' '127.0.0.1'
;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; no servers could be reached

Mittwoch, 08. November 2023, 21:05:59
{0} <FreeBSD 15.0-CURRENT> [dns:~]
(211) root@ttypts/5 # netstat -na | grep '\.53'
tcp46      0      0 *.53                   *.*                    LISTEN
udp46      0      0 *.53                   *.*

The log which corresponds to the communication errors is:

2023/11/08 21:04:00.816516 74970#133 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 59113
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.example.com.       IN       A

;; ANSWER SECTION:
www.example.com.        86391   IN      A       93.184.216.34

2023/11/08 21:04:00.816594 74970#133 [error] responding udp request: writing message: write udp [::]:53->127.0.0.1:44690: sendmsg: can't assign requested address

I'm surprised the source IP for the udp write is "[::]" and not one of the suitable IPs (127.0.0.1 in this case). The successful one had the correct IP (::1).... EDIT: ooops, forget that part, this is a new connection and not the IP for the successful send (which is not logged):

2023/11/08 21:03:51.784897 74970#130 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 21116
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.example.com.       IN       A

;; ANSWER SECTION:
www.example.com.        86400   IN      A       93.184.216.34

2023/11/08 21:03:51.786961 74970#60 [debug] dnsproxy: handling new udp packet from [::1]:42275

netchild avatar Nov 08 '23 20:11 netchild

And the sysctl is set to the default value:

# sysctl net.inet6.ip6.v6only
net.inet6.ip6.v6only: 1

netchild avatar Nov 08 '23 20:11 netchild

Yes, seems like FreeBSD doesn't send IPv4 queries to an IPv6 sockets by default, which is why your configuration with both IPv4 and IPv6 localhost addresses works. So it is likely that binding to both 0.0.0.0 and :: would work for both protocols without changing the OS settings. (Assuming that a wildcard bind is what is expected.)

ainar-g avatar Nov 09 '23 10:11 ainar-g

Adguardhome throws an error for "::" 2023/11/09 14:30:03.036211 [error] parsing configuration file: dns.bind_hosts at index 1 is not a valid ip address

netchild avatar Nov 09 '23 13:11 netchild

I don't have that error. Can you show the resulting YAML?

dns:
  bind_hosts:
    - '0.0.0.0'
    - '::'

ainar-g avatar Nov 13 '23 10:11 ainar-g

I had it initially without the quotes. This does work for 0.0.0.0, and ::1 and other specific IPv6 addresses but not for ::. Now with quotes I get:

2023/11/13 13:49:33.513786 [info] dnsproxy: cache: enabled, size 4194304 b
2023/11/13 13:49:33.513853 [info] dnsproxy: max goroutines is set to 300
2023/11/13 13:49:33.513883 [info] dnsproxy: creating udp server socket 0.0.0.0:53
2023/11/13 13:49:33.513976 [info] dnsproxy: listening to udp://[::]:53
2023/11/13 13:49:33.513983 [info] dnsproxy: creating udp server socket [::]:53
2023/11/13 13:49:33.514030 [info] dnsproxy: listening to udp://[::]:53
2023/11/13 13:49:33.514040 [info] dnsproxy: creating tcp server socket 0.0.0.0:53
2023/11/13 13:49:33.514098 [info] dnsproxy: listening to tcp://[::]:53
2023/11/13 13:49:33.514105 [info] dnsproxy: creating tcp server socket [::]:53
2023/11/13 13:49:33.514149 [info] dnsproxy: listening to tcp://[::]:53
2023/11/13 13:49:33.514156 [info] dnsproxy: creating tls server socket 0.0.0.0:853
2023/11/13 13:49:33.514193 [info] dnsproxy: listening to tls://[::]:853
2023/11/13 13:49:33.514200 [info] dnsproxy: creating tls server socket [::]:853
2023/11/13 13:49:33.518658 [error] querylog: closing: flushing log buffer: nothing to write to a file
2023/11/13 13:47:59.850979 [fatal] couldn't start forwarding DNS server: starting listeners: listening on tls addr [::]:853: listen tcp [::]:853: bind: address already in use

It is dying too fast to be able to observe if it wants to create a tcp4/tcp6 socket, or if is creating a tcp46/tcp6 socket and the tcp46 socket clashes with the tcp6 socket (just an assumption what could happen, not sure if this is really a possible failure case).

netchild avatar Nov 13 '23 12:11 netchild

hey @netchild

have you found any solution? I'm running into the same issue but on ubuntu 22.04.

vahobrsti avatar Jan 16 '24 14:01 vahobrsti

No, I'm still running with the explicit list of addresses. The behavior of FreeBSD here is a valid way of operation, and other software packages handle this, so to me it looks like adguardhome would need a fix or the go runtime would need a fix, not sure which one of them.

netchild avatar Jan 16 '24 20:01 netchild

@netchild Thanks. I sorted by adding addresses like this:

dns:
  bind_hosts:
    - '::1'
    - '127.0.0.1'
    - <ipv6>
    - <ipv4>

Is that how you have resolved it?

vahobrsti avatar Jan 16 '24 23:01 vahobrsti

Yes.

netchild avatar Jan 17 '24 07:01 netchild

is your issue solved with this solution @netchild dns: bind_hosts: - '::1' - '127.0.0.1' - -

overwatch3560 avatar Mar 25 '24 22:03 overwatch3560

is your issue solved with this solution @netchild dns: bind_hosts:

This is not a solution, it is a workaround.

netchild avatar Mar 26 '24 11:03 netchild

Have you tried again with the latest version?

overwatch3560 avatar Mar 27 '24 00:03 overwatch3560

I tried again with 0.107.45 and can't reproduce the behavior anymore.

netchild avatar Mar 28 '24 12:03 netchild