pdns
pdns copied to clipboard
auth: pipebackend lookups twice for the same qname+qtype
- Program: Authoritative
- Issue type: Bug report
Short description
Something changed with the lookup algorithm. pdns_serve with libpipebackend.so is asking twice for the same qname+qtype when not in APEX. We're unable to do WRR because of this, as pdns_serve is asks twice and respond with second answer, ignoring the first answer. Our backend also have to do two geoip-lookups for every query.
Environment
- Operating system: MacOs aarch64
- Software version: 4.6.2
Steps to reproduce
- Config:
local-port=2253
local-address=127.0.0.1 ::1
version-string=xyz
server-id=disabled
resolver=127.0.0.1:1153
expand-alias=yes
query-cache-ttl=0
cache-ttl=0
zone-cache-refresh-interval=0
zone-metadata-cache-ttl=5
consistent-backends=yes
any-to-tcp=yes
disable-axfr=yes
edns-subnet-processing=yes
primary=no
launch=pipe
pipe-abi-version=3
pipe-command=/tmp/pybackend.sock
distributor-threads=16
- Our custom backend is being asked twice for the same qname+qtype and discard first answer.
Expected behaviour
For qname+qtype outside apex, pdns_serve should not ask twice, It breaks Weighted Round Robin on the backend and it have to process two GeoIP lookups for every query.
Actual behaviour
- (correct) pdsn_serve does multiple SOA lookup to establish zone, recursivelly from left-to-right.
- (incorrect) Once SOA is received, then pdns_server lookups twice for record outside apex
test.domain.com(zone=test.domain.com, qname=random.test.domain.com). Bellow is a test run withrandom.test.domain.comgenerating random pair of A and AAAA records, one each.
dig +subnet=$DE -p 2253 @127.0.0.1 random.test.domain.com A
; <<>> DiG 9.10.6 <<>> +subnet=152.70.186.215 -p 2253 @127.0.0.1 random.test.domain.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7856
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; CLIENT-SUBNET: 152.70.186.215/32/0
;; QUESTION SECTION:
;random.test.domain.com. IN A
;; ANSWER SECTION:
random.test.domain.com. 45 IN A 28.183.97.53
;; Query time: 2 msec
;; SERVER: 127.0.0.1#2253(127.0.0.1)
;; WHEN: Tue May 10 02:02:56 CEST 2022
;; MSG SIZE rcvd: 79
BACKEND log:
(... 16 threads )
INFO:__main__:Socket connected
DEBUG:__main__:HELO 3
DEBUG:__main__:OK pybackend.py
INFO:__main__:Socket connected
DEBUG:__main__:HELO 3
DEBUG:__main__:OK pybackend.py
INFO:__main__:Socket connected
DEBUG:__main__:HELO 3
DEBUG:__main__:OK pybackend.py
DEBUG:__main__:Q random.test.domain.com IN SOA -1 0.0.0.0 0.0.0.0 0.0.0.0/0
DEBUG:__main__:END
DEBUG:__main__:Q test.domain.com IN SOA -1 0.0.0.0 0.0.0.0 0.0.0.0/0
DEBUG:__main__:DATA 0 1 test.domain.com IN SOA 180 -1 ns1.test.domain.com hostmaster.test.domain.com 1 900 300 172800 180
DEBUG:__main__:END
DEBUG:__main__:Q random.test.domain.com IN ANY -1 127.0.0.1 127.0.0.1 152.70.186.215/32
DEBUG:__main__:DATA 0 1 random.test.domain.com IN A 45 -1 39.8.113.118
DEBUG:__main__:DATA 0 1 random.test.domain.com IN AAAA 45 -1 2603:9999:0:0:B5E9:70CC:54B1:3FDB
DEBUG:__main__:END
DEBUG:__main__:Q random.test.domain.com IN ANY -1 127.0.0.1 127.0.0.1 152.70.186.215/32
DEBUG:__main__:DATA 0 1 random.test.domain.com IN A 45 -1 28.183.97.53
DEBUG:__main__:DATA 0 1 random.test.domain.com IN AAAA 45 -1 2603:9999:0:0:A861:7BB3:A13D:B07C
DEBUG:__main__:END
In the above, pdns_server issued two Q lines for random.test.domain.com/ANY to our backend and responded with the last answer, 39.8.113.118 and 2603:9999:...:3FDB being ignored.
Other information
In 4.3.2, lookup algorith worked as follows:
- Querying from left-to-right to establish SOA (zone)
- For qname outside apex, it asked for qname+qtype/NS to check for child delegation. In 4.6.2, this query is qname+qtype/ANY (incorrect) instead of qname+qtype/NS (correct)
- If no delegation, then issue a final qname+qtype/ANY
In 4.6.2, step #2 issue qname+qtype/ANY for child delegation checking, instead of qname+qtype/NS. This is why step #2 and step #3 are the same in 4.6.2 and out backend has to process the same query twice.
How it should process: if step #2 doesn't have any NS in the answer, then pdns_server shouldn/t lookup #3.
#!/usr/bin/env python3
import os, asyncio, random, logging, signal
socket_path = "/tmp/pybackend.sock"
logger = logging.getLogger(__name__)
def randomA():
return '.'.join(str(random.randint(1, 223)) for i in range(4))
def randomAAAA():
return '2603:9999:0:0:' + ':'.join('{:0>4X}'.format(random.randint(1024,64000)) for i in range(4))
def process_query(qname, qtype, client_ip, edns_subnet):
data = "DATA\t0\t1\t{}\tIN\t{{qtype}}\t{}\t{}\t{{rdata}}\n".format(qname, 3600, -1)
soa = data.format(qtype="SOA",rdata="ns1.test.domain.com hostmaster.test.domain.com 1 900 300 172800 180")
if qname == "test.domain.com":
if qtype in {"SOA","ANY"}:
yield soa
if qtype in {"NS","ANY"}:
yield data.format(qtype="NS",rdata="ns1.test.domain.com")
if qtype in {"A","ANY"}:
yield data.format(qtype="A",rdata="1.2.3.4")
elif qname == "ns1.test.domain.com": # glue
if qtype in {"A","ANY"}:
yield data.format(qtype="A",rdata="99.99.99.99")
elif qname == "random.test.domain.com":
if qtype in {"A","ANY"}:
yield data.format(qtype="A", rdata=randomA())
if qtype in {"AAAA","ANY"}:
yield data.format(qtype="AAAA", rdata=randomAAAA())
async def rd(reader):
data = await reader.readline()
if data:
line = data.decode()
logger.debug(line.rstrip())
return line
else:
return False
async def wr(writer, message):
logger.debug(message.rstrip())
writer.write(message.encode())
await writer.drain()
async def handler(reader, writer):
logger.info("Socket connected")
line = await rd(reader)
if line != "HELO\t3\n":
await wr(writer, "LOG\tWrong protocol\nFAIL\n")
writer.close()
return
await wr(writer,"OK\tmicrobackend.py\n")
while True:
line = await rd(reader)
if not line:
break # client closed
a = line.split("\t")
if a[0] != "Q":
logger.error("Not supported")
await wr(writer,"FAIL\n")
continue
# Process the results
for v in process_query(a[1].lower(), a[3], a[5], a[7]):
await wr(writer,v)
await wr(writer,"END\n")
writer.close()
await writer.wait_closed()
logger.info("Socket closed")
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
os.umask(0)
loop = asyncio.get_event_loop()
def exit_signal():
unix_server.close()
loop.stop()
for signame in {'SIGINT', 'SIGTERM'}:
loop.add_signal_handler(getattr(signal, signame), exit_signal)
unix_server = asyncio.start_unix_server(handler, path=socket_path, loop=loop)
loop.create_task(unix_server)
loop.run_forever()
This is the result of enabled consistent-backends with a disabled query cache. consistent-backends=no will restore the "old" behavior.
https://doc.powerdns.com/authoritative/settings.html#setting-consistent-backends
Thank you, indeed setting consistent-backends=no restores the previous lookup behaviour.
However, I feel that the new lookup algorithm is buggy when cache is either disabled or empty