pdns icon indicating copy to clipboard operation
pdns copied to clipboard

auth: pipebackend lookups twice for the same qname+qtype

Open pawlakus opened this issue 3 years ago • 3 comments
trafficstars

  • Program: Authoritative
  • Issue type: Bug report

Short description

Something changed with the lookup algorithm. pdns_serve with libpipebackend.so is asking twice for the same qname+qtype when not in APEX. We're unable to do WRR because of this, as pdns_serve is asks twice and respond with second answer, ignoring the first answer. Our backend also have to do two geoip-lookups for every query.

Environment

  • Operating system: MacOs aarch64
  • Software version: 4.6.2

Steps to reproduce

  1. Config:
local-port=2253
local-address=127.0.0.1 ::1
version-string=xyz
server-id=disabled

resolver=127.0.0.1:1153
expand-alias=yes
query-cache-ttl=0
cache-ttl=0
zone-cache-refresh-interval=0
zone-metadata-cache-ttl=5
consistent-backends=yes

any-to-tcp=yes
disable-axfr=yes
edns-subnet-processing=yes

primary=no
launch=pipe
pipe-abi-version=3
pipe-command=/tmp/pybackend.sock
distributor-threads=16
  1. Our custom backend is being asked twice for the same qname+qtype and discard first answer.

Expected behaviour

For qname+qtype outside apex, pdns_serve should not ask twice, It breaks Weighted Round Robin on the backend and it have to process two GeoIP lookups for every query.

Actual behaviour

  1. (correct) pdsn_serve does multiple SOA lookup to establish zone, recursivelly from left-to-right.
  2. (incorrect) Once SOA is received, then pdns_server lookups twice for record outside apex test.domain.com (zone=test.domain.com, qname=random.test.domain.com). Bellow is a test run with random.test.domain.com generating random pair of A and AAAA records, one each.
dig +subnet=$DE -p 2253 @127.0.0.1 random.test.domain.com A          

; <<>> DiG 9.10.6 <<>> +subnet=152.70.186.215 -p 2253 @127.0.0.1 random.test.domain.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7856
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; CLIENT-SUBNET: 152.70.186.215/32/0
;; QUESTION SECTION:
;random.test.domain.com.		IN	A

;; ANSWER SECTION:
random.test.domain.com.	45	IN	A	28.183.97.53

;; Query time: 2 msec
;; SERVER: 127.0.0.1#2253(127.0.0.1)
;; WHEN: Tue May 10 02:02:56 CEST 2022
;; MSG SIZE  rcvd: 79

BACKEND log:

(... 16 threads )
INFO:__main__:Socket connected
DEBUG:__main__:HELO     3
DEBUG:__main__:OK       pybackend.py
INFO:__main__:Socket connected
DEBUG:__main__:HELO     3
DEBUG:__main__:OK       pybackend.py
INFO:__main__:Socket connected
DEBUG:__main__:HELO     3
DEBUG:__main__:OK       pybackend.py
DEBUG:__main__:Q        random.test.domain.com  IN      SOA     -1      0.0.0.0 0.0.0.0 0.0.0.0/0
DEBUG:__main__:END
DEBUG:__main__:Q        test.domain.com IN      SOA     -1      0.0.0.0 0.0.0.0 0.0.0.0/0
DEBUG:__main__:DATA     0       1       test.domain.com IN      SOA     180     -1      ns1.test.domain.com hostmaster.test.domain.com 1 900 300 172800 180
DEBUG:__main__:END
DEBUG:__main__:Q        random.test.domain.com  IN      ANY     -1      127.0.0.1       127.0.0.1       152.70.186.215/32
DEBUG:__main__:DATA     0       1       random.test.domain.com  IN      A       45      -1      39.8.113.118
DEBUG:__main__:DATA     0       1       random.test.domain.com  IN      AAAA    45      -1      2603:9999:0:0:B5E9:70CC:54B1:3FDB
DEBUG:__main__:END
DEBUG:__main__:Q        random.test.domain.com  IN      ANY     -1      127.0.0.1       127.0.0.1       152.70.186.215/32
DEBUG:__main__:DATA     0       1       random.test.domain.com  IN      A       45      -1      28.183.97.53
DEBUG:__main__:DATA     0       1       random.test.domain.com  IN      AAAA    45      -1      2603:9999:0:0:A861:7BB3:A13D:B07C
DEBUG:__main__:END

In the above, pdns_server issued two Q lines for random.test.domain.com/ANY to our backend and responded with the last answer, 39.8.113.118 and 2603:9999:...:3FDB being ignored.

Other information

In 4.3.2, lookup algorith worked as follows:

  1. Querying from left-to-right to establish SOA (zone)
  2. For qname outside apex, it asked for qname+qtype/NS to check for child delegation. In 4.6.2, this query is qname+qtype/ANY (incorrect) instead of qname+qtype/NS (correct)
  3. If no delegation, then issue a final qname+qtype/ANY

In 4.6.2, step #2 issue qname+qtype/ANY for child delegation checking, instead of qname+qtype/NS. This is why step #2 and step #3 are the same in 4.6.2 and out backend has to process the same query twice.

How it should process: if step #2 doesn't have any NS in the answer, then pdns_server shouldn/t lookup #3.

pawlakus avatar May 10 '22 00:05 pawlakus

#!/usr/bin/env python3
import os, asyncio, random, logging, signal
socket_path = "/tmp/pybackend.sock"
logger = logging.getLogger(__name__)

def randomA():
    return '.'.join(str(random.randint(1, 223)) for i in range(4))

def randomAAAA():
    return '2603:9999:0:0:' + ':'.join('{:0>4X}'.format(random.randint(1024,64000)) for i in range(4))

def process_query(qname, qtype, client_ip, edns_subnet):
    data = "DATA\t0\t1\t{}\tIN\t{{qtype}}\t{}\t{}\t{{rdata}}\n".format(qname, 3600, -1)
    soa = data.format(qtype="SOA",rdata="ns1.test.domain.com hostmaster.test.domain.com 1 900 300 172800 180")
    if qname == "test.domain.com":
        if qtype in {"SOA","ANY"}:
            yield soa
        if qtype in {"NS","ANY"}:
            yield data.format(qtype="NS",rdata="ns1.test.domain.com")
        if qtype in {"A","ANY"}:
            yield data.format(qtype="A",rdata="1.2.3.4")
    elif qname == "ns1.test.domain.com": # glue
        if qtype in {"A","ANY"}:
            yield data.format(qtype="A",rdata="99.99.99.99")
    elif qname == "random.test.domain.com":
        if qtype in {"A","ANY"}:
            yield data.format(qtype="A", rdata=randomA())
        if qtype in {"AAAA","ANY"}:
            yield data.format(qtype="AAAA", rdata=randomAAAA())

async def rd(reader):
    data = await reader.readline()
    if data:
        line = data.decode()
        logger.debug(line.rstrip())
        return line
    else:
        return False

async def wr(writer, message):
    logger.debug(message.rstrip())
    writer.write(message.encode())
    await writer.drain()

async def handler(reader, writer):
    logger.info("Socket connected")
    line = await rd(reader)
    if line != "HELO\t3\n":
        await wr(writer, "LOG\tWrong protocol\nFAIL\n")
        writer.close()
        return
    await wr(writer,"OK\tmicrobackend.py\n")
    while True:
        line = await rd(reader)
        if not line:
            break # client closed
        a = line.split("\t")
        if a[0] != "Q":
            logger.error("Not supported")
            await wr(writer,"FAIL\n")
            continue
        # Process the results
        for v in process_query(a[1].lower(), a[3], a[5], a[7]):
            await wr(writer,v)
        await wr(writer,"END\n")
    writer.close()
    await writer.wait_closed()
    logger.info("Socket closed")

if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    os.umask(0)
    loop = asyncio.get_event_loop()
    def exit_signal():
        unix_server.close()
        loop.stop()
    for signame in {'SIGINT', 'SIGTERM'}:
        loop.add_signal_handler(getattr(signal, signame), exit_signal)    
    unix_server = asyncio.start_unix_server(handler, path=socket_path, loop=loop)
    loop.create_task(unix_server)
    loop.run_forever()

pawlakus avatar May 10 '22 01:05 pawlakus

This is the result of enabled consistent-backends with a disabled query cache. consistent-backends=no will restore the "old" behavior.

https://doc.powerdns.com/authoritative/settings.html#setting-consistent-backends

mind04 avatar May 10 '22 08:05 mind04

Thank you, indeed setting consistent-backends=no restores the previous lookup behaviour.

However, I feel that the new lookup algorithm is buggy when cache is either disabled or empty

pawlakus avatar May 10 '22 12:05 pawlakus