blocky icon indicating copy to clipboard operation
blocky copied to clipboard

When upstream is unavailable, blocky returns "Not Ready" responses for queries that do not rely on upstream

Open ideabucket opened this issue 1 year ago • 7 comments

I have this Blocky setup:

upstreams:
  init:
    strategy: fast
  groups:
    default:
      - 10.64.0.1
  strategy: parallel_best

conditional:
  fallbackUpstream: false
  mapping:
    local.dev: 192.168.1.1
    168.192.in-addr.arpa: 192.168.1.1

10.64.0.1 is a DNS server on the far side of a WireGuard tunnel from my router. If that tunnel is offline (and Blocky thus can't reach upstream), and I query a host in my internal domain local.dev, I get this response from Blocky:

root@GatewayMax:~# dig @ns.local.dev gw.local.dev

; <<>> DiG 9.16.50-Debian <<>> @ns.local.dev gw.local.dev
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 6281
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 14 (Not Ready)
;; QUESTION SECTION:
;gw.local.dev.		IN	A

;; Query time: 10 msec
;; SERVER: 192.168.1.7#53(192.168.1.7)
;; WHEN: Sun Nov 10 10:59:35 AEDT 2024
;; MSG SIZE  rcvd: 54

It appears that, because the primary upstream is unavailable, Blocky is refusing to answer queries, before checking if it actually needs upstream to respond to this query (it doesn't).

Expected behaviour: Blocky recognises that this query should be forwarded to 192.168.1.1 and does so, even though upstream is down.

ideabucket avatar Nov 10 '24 00:11 ideabucket

Possibly relevant: local.dev is a placeholder. I actually use a real domain that has public authoritative nameservers for my internal network, so I can complete ACME DNS challenges. But the A records for my internal hosts aren't on the authoritative nameservers, only on 192.168.1.1.

ideabucket avatar Nov 10 '24 00:11 ideabucket

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Feb 08 '25 04:02 github-actions[bot]

Still very much an issue, unfortunately.

ideabucket avatar Feb 08 '25 07:02 ideabucket

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 10 '25 04:05 github-actions[bot]

Keepalive. (Closing bug reports after 90 days is very annoying behaviour.)

ideabucket avatar May 11 '25 05:05 ideabucket

Can confirm this happens and it is annoying, all my homelab services go down when the upstream becomes unreachable even though I have my domains on local resolution

dpkg-i-foo-deb avatar Jun 03 '25 22:06 dpkg-i-foo-deb

I had an internet outage today and based on this issue my clients were not able to resolve local dns entries. All my local services stopped working for hours until the internet outage was fixed. Blocky only continued resolving customDNS entries when the public upstream DNS server was available.

@0xERR0R Sorry for pinging, not sure if you have seen the issue yet. In my point of view it's a critical issue, since from a client perspective Blocky stops working without upstream access. Hope it can be fixed.

brokoler avatar Jun 07 '25 22:06 brokoler

I tried to reproduce it with following config:

upstreams:
  init:
    strategy: fast
  groups:
    default:
      - 200.200.200.200
  strategy: parallel_best

conditional:
  fallbackUpstream: false
  mapping:
    local.dev: 192.168.178.1
    168.192.in-addr.arpa: 192.168.178.1

upstream 200.200.200.200 is not reachable.

´dig @localhost gw.local.dev´ returns NXDOMAIN (as expected and sends query to 192.168.178.1). ´dig @localhost example.com ´ ends with timeout

Does this error still occur in the latest version of blocky?

0xERR0R avatar Jul 11 '25 20:07 0xERR0R

I just attempted to reproduce the issue against the v0.26.2 docker image and wasn't able to do so. Given the age of the initial report I was probably running v0.24 when I filed it so it may well have been fixed in one of the intervening versions.

ideabucket avatar Jul 12 '25 05:07 ideabucket

@0xERR0R I just had an unexpected internet outage and took the opportunity to retest this issue. Here's the dig output:

; <<>> DiG 9.10.6 <<>> @192.168.1.7 [redacted.localdomain]
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 233
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; OPT=15: 00 0e ("..")
;; QUESTION SECTION:
;[redacted.localdomain].		IN	A

;; Query time: 7 msec
;; SERVER: 192.168.1.7#53(192.168.1.7)
;; WHEN: Thu Aug 14 18:14:33 AEST 2025
;; MSG SIZE  rcvd: 57

This is running blocky v0.26.2 in a Docker container.

My log is full of lines like this, with all the upstream queries timing out:

[2025-08-14 18:13:36] ERROR error on processing request:upstream 'tcp+udp:10.64.0.1': can't resolve request via upstream server tcp+udp:10.64.0.1 (10.64.0.1:53): read udp 172.29.8.2:44390->10.64.0.1:53: i/o timeout client_ip=192.168.1.42 question=A (api.dropboxapi.com.) req_id=6ea5198a-5de6-4ce1-9376-029faf1dedde

However, there is no log line matching the query for [redacted.localdomain]. I don't know what to make of that.

When I have some time, I'll make another attempt to construct a set of lab conditions to reproduce the issue.

ideabucket avatar Aug 14 '25 08:08 ideabucket