pdns icon indicating copy to clipboard operation
pdns copied to clipboard

Recursor stops responding after `rec_control reload-lua-config`

Open paddg opened this issue 2 years ago • 15 comments

  • Program: Recursor
  • Issue type: Bug report

Short description

Recursor stops responding after rec_control reload-lua-config

Environment

  • Operating system: SLES15-SP4
  • Software version: PowerDNS Recursor 4.7.1
  • Software source: PowerDNS repository, self compiled (build-log)

Steps to reproduce

mkdir -p tmp; cd tmp
> recursor.conf

echo 'dnstapFrameStreamServer( "10.20.30.40:1234" )' > lua.conf

pdns_recursor --daemon=no --config-dir=. --socket-dir=. --local-address=127.0.0.53 --lua-config-file=./lua.conf

dig +tries=1 +time=1 @127.0.0.53 localhost  ## WORKS

rec_control --socket-dir=. reload-lua-config

dig +tries=1 +time=1 @127.0.0.53 localhost  ## FAILS (timeout)


Expected behaviour

Recursor should never stop responding after reload-lua-config

Actual behaviour

Recursor does not work anymore after reload-lua-config on circumstances.

Other information

To trigger the error, the FrameStreamServer IP address must not be a local address of the server itself, but a external one. It doesn't matter if the address answers or not.

There is nothing in the journal what indicates an error.

paddg avatar Jul 18 '22 11:07 paddg

Hello @paddg, thank you for the report. I am unable to reproduce the issue. You mention that you built from source from the pdns repository, specifically recursor version 4.7.0 (I assume that means the rec-4.7.0 branch?). Could you provide the following information?

  1. The configure flags used.
  2. The version of Lua being used.
  3. If possible, whether the problem also exists on master.
  4. If possible, whether the problem also exists on another distribution.

fredmorcos avatar Jul 19 '22 10:07 fredmorcos

Thanks for investigation, @fredmorcos! I'll try to further isolate the issue so it's hopefully easier to reproduce. Unfortunately I won't be able to do that for a few days. But then I will also answer your questions.

paddg avatar Jul 21 '22 19:07 paddg

Hello @fredmorcos, I've updated the issue description and the steps to reproduce it. Also I've attached the build-log. I hope this helps you. I've no answer for questions 3 and 4.

paddg avatar Jul 25 '22 12:07 paddg

If possible, whether the problem also exists on another distribution.

I tried it now on a Ubuntu 20.04 "Focal Fossa" with the 4.7.1 package from https://repo.powerdns.com/. And I was able to reproduce the issue with the described procedure from above.

Were you able to reproduce the issue as well?

paddg avatar Jul 26 '22 08:07 paddg

Hi @paddg, could you provide the contents of recursor.conf please? Thank you! I'm working on reproducing this on Ubuntu 20.04.

fredmorcos avatar Jul 26 '22 12:07 fredmorcos

could you provide the contents of recursor.conf please?

It is empty. See "Steps to reproduce": > recursor.conf

paddg avatar Jul 26 '22 12:07 paddg

I've been trying to reproduce this on Ubuntu 20.04 and on Archlinux to no avail. I tested with the master branch and the rec-4.7.1 tag. Here are the steps I follow:

$ git clone https://github.com/PowerDNS/pdns.git
$ cd pdns/pdns/recursordist/
$ autoreconf -vi

# Set CFLAGS and CXXFLAGS here to replicate your SLES build as much as possible. And use --with-lua=lua5.3 because it otherwise prefers luajit.
$ CFLAGS='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' CXXFLAGS='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' ./configure --enable-nod --enable-dns-over-tls --with-lua=lua5.3

$ make -j4
$ mkdir tmp
$ cd tmp/
$ touch recursor.conf
$ echo 'dnstapFrameStreamServer( "10.20.30.40:1234" )' > lua.conf
$ sudo ../pdns_recursor --daemon=no --config-dir=. --socket-dir=. --local-address=127.0.0.53 --local-port=54 --lua-config-file=./lua.conf

Jul 26 21:03:01 PowerDNS Recursor 0.0.24187.0.master.g9c9e6a1dc3 (C) 2001-2022 PowerDNS.COM BV
Jul 26 21:03:01 Using 64-bits mode. Built using gcc 12.1.0 on Jul 26 2022 20:54:03 by fred@axon.
Jul 26 21:03:01 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Jul 26 21:03:01 msg="Enabling IPv4 transport for outgoing queries" subsystem="config" level="0" prio="Notice" tid="0" ts="1658862181.668"
Jul 26 21:03:01 msg="NOT using IPv6 for outgoing queries - add an IPv6 address (like '::') to query-local-address to enable" subsystem="config" level="0" prio="Warning" tid="0" ts="1658862181.668"
Jul 26 21:03:01 msg="Setting access control" subsystem="config" level="0" prio="Info" tid="0" ts="1658862181.669" acl="allow-from" addresses="127.0.0.0/8 10.0.0.0/8 100.64.0.0/10 169.254.0.0/16 192.168.0.0/16 172.16.0.0/12 ::1/128 fc00::/7 fe80::/10"
Jul 26 21:03:01 msg="Will not send queries to" subsystem="config" level="0" prio="Notice" tid="0" ts="1658862181.672" addresses="127.0.0.0/8 10.0.0.0/8 100.64.0.0/10 169.254.0.0/16 192.168.0.0/16 172.16.0.0/12 ::1/128 fc00::/7 fe80::/10 0.0.0.0/8 192.0.0.0/24 192.0.2.0/24 198.51.100.0/24 203.0.113.0/24 240.0.0.0/4 ::/96 ::ffff:0:0/96 100::/64 2001:db8::/32 0.0.0.0 ::"
Jul 26 21:03:01 msg="PowerDNS Recursor itself will distribute queries over threads" subsystem="config" level="0" prio="Notice" tid="0" ts="1658862181.672"
Jul 26 21:03:01 msg="Inserting rfc 1918 private space zones" subsystem="config" level="0" prio="Notice" tid="0" ts="1658862181.672"
Jul 26 21:03:01 msg="Listening for queries" subsystem="config" level="0" prio="Info" tid="0" ts="1658862181.672" address="127.0.0.53:54" protocol="UDP"
Jul 26 21:03:01 msg="Enabled TCP data-ready filter for (slight) DoS protection" subsystem="config" level="0" prio="Info" tid="0" ts="1658862181.672"
Jul 26 21:03:01 msg="Listening for queries" subsystem="config" level="0" prio="Info" tid="0" ts="1658862181.672" address="127.0.0.53:54" protocol="TCP"
Jul 26 21:03:01 msg="Raised soft limit on number of filedescriptors to match max-mthreads and threads settings" subsystem="config" level="0" prio="Warning" tid="0" ts="1658862181.673" limit="4321"
Jul 26 21:03:01 msg="Launching distributor threads" subsystem="config" level="0" prio="Notice" tid="0" ts="1658862181.673" count="1"
Jul 26 21:03:01 msg="Launching worker threads" subsystem="config" level="0" prio="Notice" tid="0" ts="1658862181.673" count="2"
Jul 26 21:03:01 msg="Enabled multiplexer" subsystem="runtime" level="0" prio="Info" tid="0" ts="1658862181.674" name="epoll"
Jul 26 21:03:02 msg="Not validating response for security status update, this is a non-release version" subsystem="housekeeping" level="0" prio="Warning" tid="0" ts="1658862182.866" query="recursor-0.0.24187.0.master.g9c9e6a1dc3.security-status.secpoll.powerdns.com" version="0.0.24187.0.master.g9c9e6a1dc3"

To test:

# In a separate terminal.
$ nmap -A 127.0.0.53 -p 54

Starting Nmap 7.92 ( https://nmap.org ) at 2022-07-26 21:03 CEST
Nmap scan report for 127.0.0.53
Host is up (0.000044s latency).

PORT   STATE SERVICE VERSION
54/tcp open  domain  PowerDNS Recursor 0.0.24187.0.master.g9c9e6a1dc3
| dns-nsid:
|   NSID: axon (61786f6e)
|   id.server: axon
|_  bind.version: PowerDNS Recursor 0.0.24187.0.master.g9c9e6a1dc3 (built Jul 26 2022 20:54:03 by fred@axon)

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 39.34 seconds

$ dig +tries=1 +time=1 @127.0.0.53 -p 54 localhost

; <<>> DiG 9.18.5 <<>> +tries +time @127.0.0.53 -p 54 localhost
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30071
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;localhost.			IN	A

;; ANSWER SECTION:
localhost.		86400	IN	A	127.0.0.1

;; Query time: 0 msec
;; SERVER: 127.0.0.53#54(127.0.0.53) (UDP)
;; WHEN: Tue Jul 26 21:04:39 CEST 2022
;; MSG SIZE  rcvd: 54

Reload the configuration:

$ sudo ../rec_control --socket-dir=. reload-lua-config

Reloaded Lua configuration file './lua.conf'

and I get the following in the recursor output:

Jul 26 21:04:59 msg="Received rec_control command via control socket" subsystem="control" level="0" prio="Info" tid="0" ts="1658862299.501" command="reload-lua-config"
Jul 26 21:04:59 Reloaded Lua configuration file './lua.conf', requested via control channel

Trying again:

$ dig +tries=1 +time=1 @127.0.0.53 -p 54 localhost

; <<>> DiG 9.18.5 <<>> +tries +time @127.0.0.53 -p 54 localhost
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34987
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;localhost.			IN	A

;; ANSWER SECTION:
localhost.		86328	IN	A	127.0.0.1

;; Query time: 0 msec
;; SERVER: 127.0.0.53#54(127.0.0.53) (UDP)
;; WHEN: Tue Jul 26 21:05:51 CEST 2022
;; MSG SIZE  rcvd: 54

Build/configure summary on Ubuntu:

configure: Configuration summary
configure: =====================
configure:
configure: PowerDNS Recursor 0.0.24187.0.master.g9c9e6a1dc3 configured with:  '--enable-nod' '--enable-dns-over-tls' '--with-lua=lua5.3' 'CFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' 'CXXFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g'
configure:
configure: CC: gcc
configure: CXX: g++ -std=c++17
configure: LD: /usr/bin/ld -m elf_x86_64
configure: CFLAGS:  -fPIE -DPIE -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 --param ssp-buffer-size=4 -fstack-protector -Wall -Wextra -Wshadow -Wno-unused-parameter -Wmissing-declarations -Wredundant-decls -fvisibility=hidden -g -O2 -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g
configure: CPPFLAGS:
configure: CXXFLAGS:  -fPIE -DPIE -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 --param ssp-buffer-size=4 -fstack-protector -Wall -Wextra -Wshadow -Wno-unused-parameter -Wmissing-declarations -Wredundant-decls -fvisibility=hidden -g -O2 -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g
configure: LDFLAGS: -Wl,-z -Wl,relro -Wl,-z -Wl,now
configure: LIBS: -lpthread
configure: BOOST_CPPFLAGS:  -pthread
configure:
configure: Features enabled
configure: ----------------
configure: Lua: lua5.3
configure: OpenSSL ECDSA: yes
configure: ed25519: yes
configure: ed448: yes
configure: Protobuf: yes
configure: SNMP: yes
configure: systemd: yes
configure: nod: yes
configure: dnstap: yes
configure: DNS over TLS: yes
configure: OpenSSL: yes
configure: libcurl: yes
configure: Context library: Boost Context

and build/configure summary on Archlinux:

configure: Configuration summary
configure: =====================
configure:
configure: PowerDNS Recursor 0.0.24187.0.master.g9c9e6a1dc3 configured with:  '--enable-nod' '--enable-dns-over-tls' '--with-lua=lua5.3' 'CFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' 'CXXFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g'
configure:
configure: CC: gcc
configure: CXX: g++ -std=c++17
configure: LD: /usr/bin/ld -m elf_x86_64
configure: CFLAGS:  -fPIE -DPIE -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 --param ssp-buffer-size=4 -fstack-protector -Wall -Wextra -Wshadow -Wno-unused-parameter -Wmissing-declarations -Wredundant-decls -fvisibility=hidden -g -O2 -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g
configure: CPPFLAGS:
configure: CXXFLAGS:  -fPIE -DPIE -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 --param ssp-buffer-size=4 -fstack-protector -Wall -Wextra -Wshadow -Wno-unused-parameter -Wmissing-declarations -Wredundant-decls -fvisibility=hidden -g -O2 -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g
configure: LDFLAGS: -Wl,-z -Wl,relro -Wl,-z -Wl,now
configure: LIBS:
configure: BOOST_CPPFLAGS:  -pthread
configure:
configure: Features enabled
configure: ----------------
configure: Lua: lua5.3
configure: OpenSSL ECDSA: yes
configure: ed25519: yes
configure: ed448: yes
configure: Protobuf: yes
configure: SNMP: yes
configure: systemd: yes
configure: nod: yes
configure: dnstap: yes
configure: DNS over TLS: yes
configure: OpenSSL: yes
configure: libcurl: yes
configure: Context library: Boost Context

I have a few more questions:

  1. I don't see a git clone or git checkout command in your build log, are you cloning from Github?
  2. Are you checking out the rec-4.7.1 tag or building from master?
  3. What does the configure summary look like for you on Ubuntu 20.04?
  4. Is the Ubuntu installation up to date?
  5. Which version of Lua is it picking up?
  6. Could you test with luajit on SLES and Ubuntu, please?
  7. Could you follow the steps above and let me know whether the issue still persists?

fredmorcos avatar Jul 26 '22 19:07 fredmorcos

I don't see a git clone or git checkout command in your build log, are you cloning from Github?

We built with this source code: https://downloads.powerdns.com/releases/pdns-recursor-4.7.1.tar.bz2

Are you checking out the rec-4.7.1 tag or building from master?

No, we built with this source code: https://downloads.powerdns.com/releases/pdns-recursor-4.7.1.tar.bz2

What does the configure summary look like for you on Ubuntu 20.04?

I installed from your Ubuntu repository: deb [arch=amd64] http://repo.powerdns.com/ubuntu focal-rec-47 main this binary package: https://repo.powerdns.com/ubuntu/pool/main/p/pdns-recursor/pdns-recursor_4.7.1-1pdns.focal_amd64.deb

pdns_recursor --version
Jul 27 06:54:34 PowerDNS Recursor 4.7.1 (C) 2001-2022 PowerDNS.COM BV
Jul 27 06:54:34 Using 64-bits mode. Built using gcc 9.4.0 on Jul  7 2022 09:34:01 by root@c79fa81baeba.
Jul 27 06:54:34 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Jul 27 06:54:34 Features: fcontext libcrypto-ecdsa libcrypto-ed25519 libcrypto-ed448 libcrypto-eddsa lua nod protobuf dnstap-framestream sodium curl DoT scrypt 
Jul 27 06:54:34 Configured with: " '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--sysconfdir=/etc/powerdns' '--enable-unit-tests' '--with-libcap' '--with-libsodium' '--enable-dns-over-tls' '--enable-dnstap' '--without-net-snmp' '--disable-silent-rules' '--with-service-user=pdns' '--with-service-group=pdns' '--enable-systemd' '--with-systemd=/lib/systemd/system' '--with-lua=luajit' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/pdns/pdns-recursor-4.7.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/pdns/pdns-recursor-4.7.1=. -fstack-protector-strong -Wformat -Werror=format-security'"

Is the Ubuntu installation up to date?

Yes.

Which version of Lua is it picking up?

dpkg -l | grep -i lua
ii  liblua5.1-0:amd64                       5.1.5-8.1build4                     amd64        Shared library for the Lua interpreter version 5.1
ii  liblua5.1-0-dev:amd64                   5.1.5-8.1build4                     amd64        Development files for the Lua language version 5.1
ii  liblua5.2-0:amd64                       5.2.4-1.1build3                     amd64        Shared library for the Lua interpreter version 5.2
ii  liblua5.2-dev:amd64                     5.2.4-1.1build3                     amd64        Development files for the Lua language version 5.2
ii  liblua5.3-0:amd64                       5.3.3-1.1ubuntu2                    amd64        Shared library for the Lua interpreter version 5.3
ii  libluajit-5.1-2:amd64                   2.1.0~beta3+dfsg-5.1build1          amd64        Just in time compiler for Lua - library version
ii  libluajit-5.1-common                    2.1.0~beta3+dfsg-5.1build1          all          Just in time compiler for Lua - common files
ii  libluajit-5.1-dev:amd64                 2.1.0~beta3+dfsg-5.1build1          amd64        Just in time compiler for Lua - development files
ii  libtolua-dev                            5.2.0-1build1                       amd64        Tool to integrate C/C++ code with Lua - development files
ii  lua-lpeg:amd64                          1.0.2-1                             amd64        LPeg library for the Lua language
ii  lua-yaml:amd64                          6.1-2                               amd64        LibYAML binding for Lua
ii  lua-yaml-dev:amd64                      6.1-2                               amd64        docs and test files for lua-yaml
ii  lua5.1                                  5.1.5-8.1build4                     amd64        Simple, extensible, embeddable programming language
ii  lua5.3                                  5.3.3-1.1ubuntu2                    amd64        Simple, extensible, embeddable programming language
ii  luajit                                  2.1.0~beta3+dfsg-5.1build1          amd64        Just in time compiler for Lua programming language version 5.1

Could you test with luajit on SLES and Ubuntu, please?

The Ubuntu package is already with luajit. I did it not yet try on SLES.

Could you follow the steps above and let me know whether the issue still persists?

Yes. I did build the Recursor on Ubuntu exactly following your steps. The issue still persists.

I mentioned earlier that the issue depends on the IP address used in the dnstapFrameStreamServer function. Some examples:

"10.20.30.40"   => failed
"127.0.0.1"     => OK
"1.2.3.4"       => failed
"192.168.178.1" => OK
"194.25.2.129"  => failed

I have no idea what the trigger is here. Would you mind to try also with some different IP addresses?

paddg avatar Jul 27 '22 06:07 paddg

And by the way, not only the Recursor doesn't reply to DNS queries in the error case, it does also not respond to socket connections any more:

# ../rec_control --socket-dir=. reload-lua-config
Fatal: Timeout waiting for control channel data

From the Recursor Log:

Jul 27 08:56:56 Reloaded Lua configuration file './lua.conf', requested via control channel
Jul 27 08:56:56 msg="Exception while dealing with control socket request" error="Unable to send return code over control channel: Broken pipe" subsystem="control" level="0" prio="Error" tid="0" ts="1658905016.435" exception="PDNSException"

paddg avatar Jul 27 '22 06:07 paddg

The problem is not "persistent". It does automatically recover after some time.

paddg avatar Jul 27 '22 07:07 paddg

I did an experiment on the IP address "194.25.2.129". If this server drops the SYN packets, the Recursor stops working after reload-lua-config. But if this sever responds with reset packets ("Flags [R.]"), then the Recursor does survive.

paddg avatar Jul 27 '22 07:07 paddg

I can reproduce it. Thank you!

fredmorcos avatar Jul 27 '22 13:07 fredmorcos

Got the feeling this is pretty much the limit of what libfstrm can get you in combination with the recursors design :-(

zeha avatar Jul 27 '22 13:07 zeha

(Back from vacation, I did not look at the code yet...) I'm assuming the blocking calls in the cleanup of the old config are the underlaying problem. Possible solutions that come to mind: an asynchronous cleanup or not creating a new setup (and not destroying the old) if the old and new config are the same. I suppose the latter is more simple to implement and would fix the most common cases.

omoerbeek avatar Aug 01 '22 06:08 omoerbeek

After a discussion with @omoerbeek, the fix for this is two-fold:

  1. Avoid reloading dnstap-related configuration if it hasn't changed
  2. If the configuration did indeed change, destroy the old configuration in a way that does not block the recursor:
  • Either through asynchronous destruction of the old configuration object (and related connections)
  • Or through forcefully closing said connections (e.g. without waiting for timeouts). Note that this might not be possible with libfstrm

fredmorcos avatar Aug 02 '22 11:08 fredmorcos

@paddg Could you test this branch, please? Thank you!

fredmorcos avatar Aug 11 '22 13:08 fredmorcos

Hi @fredmorcos,

@paddg Could you test this branch, please? Thank you!

Done. I did

git clone https://github.com/fredmorcos/pdns.git pdns-git-repo
cd pdns-git-repo
./build-scripts/dist-recursor

Then built a RPM package on OBS with pdns-recursor-0.0.24187.0.master.g9c9e6a1dc3.tar.bz2

I installed the package on a lab node.

# pdns_recursor --version
Aug 12 09:05:03 PowerDNS Recursor 0.0.24187.0.master.g9c9e6a1dc3 (C) 2001-2022 PowerDNS.COM BV
Aug 12 09:05:03 Using 64-bits mode. Built using gcc 7.5.0 on Aug 12 2022 06:13:04 by abuild@lamb11.
Aug 12 09:05:03 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Aug 12 09:05:03 Features: fcontext libcrypto-ecdsa libcrypto-ed25519 libcrypto-ed448 libcrypto-eddsa lua nod protobuf dnstap-framestream snmp sodium curl 
Aug 12 09:05:03 Configured with: " '--host=x86_64-suse-linux-gnu' '--build=x86_64-suse-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/lib' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--disable-dependency-tracking' '--enable-nod' 'build_alias=x86_64-suse-linux-gnu' 'host_alias=x86_64-suse-linux-gnu' 'CFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' 'CXXFLAGS=-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'"

An re-did the tests. Unfortunately the behavior did not change. The problem is still present.

paddg avatar Aug 12 '22 07:08 paddg

Hi @paddg,

if I'm not mistaken you tested the master branch of Fred's repo (9c9e6a1dc3), not the branch containing the fix.

omoerbeek avatar Aug 12 '22 07:08 omoerbeek

if I'm not mistaken you tested the master branch of Fred's repo (9c9e6a1), not the branch containing the fix.

Oh yes! Sorry about that. I'll test again ...

paddg avatar Aug 12 '22 07:08 paddg

I tested again and it looks good so far. The Recursor does not stop responding after rec_control reload-lua-config anymore.

But now the journal keeps claiming things which I haven't seen before:

Aug 12 10:57:33 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294653.147" zone="A.rpz"
Aug 12 10:57:33 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294653.147" zone="B.rpz"
Aug 12 10:57:33 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294653.147" zone="C.rpz"
Aug 12 10:57:33 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294653.148" zone="D.rpz"
Aug 12 10:57:59 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294679.699" zone="B.rpz"
Aug 12 10:57:59 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294679.699" zone="D.rpz"
Aug 12 10:57:59 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294679.699" zone="C.rpz"
Aug 12 10:57:59 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294679.699" zone="A.rpz"
Aug 12 10:58:16 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294696.311" zone="B.rpz"
Aug 12 10:58:16 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294696.312" zone="D.rpz"
Aug 12 10:58:16 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294696.312" zone="C.rpz"
Aug 12 10:58:16 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294696.312" zone="A.rpz"
Aug 12 10:58:32 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294712.524" zone="E.rpz"
Aug 12 10:59:41 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294781.139" zone="B.rpz"
Aug 12 10:59:41 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294781.139" zone="C.rpz"
Aug 12 10:59:41 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294781.139" zone="A.rpz"
Aug 12 10:59:41 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294781.139" zone="D.rpz"
Aug 12 10:59:46 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294786.211" zone="D.rpz"
Aug 12 10:59:46 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294786.211" zone="B.rpz"
Aug 12 10:59:46 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294786.211" zone="C.rpz"
Aug 12 10:59:46 node22 pdns-recursor-2[15528]: msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294786.211" zone="A.rpz"

paddg avatar Aug 12 '22 09:08 paddg

A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1660294653.147" zone="A.rpz"

Any idea what's causing this messages?

paddg avatar Aug 16 '22 10:08 paddg

Does it keep logging these messages or does it stop at one point?

A few of these messages or normal and expected: an RPZ update might be in progress while the config is being updated. The reload will create a new RPZ loader thread, and when the old loader thread finishes its download, it sees it is no longer relevant, logs and stops.

omoerbeek avatar Aug 16 '22 10:08 omoerbeek

I tested this locally and my previous message was not accurate: you get the messages even if no RPZ download was happening. You wil see one for each reload and for each RPZ zone. The message will appear after the refresh period has elapsed, so not immediately after the reload command.

So while it is a bit spammy, the log entries are harmless and an indication things are working as intended. I might change the log level from Info to Debug though, as an admin can easily be confused by these messages.

omoerbeek avatar Aug 16 '22 11:08 omoerbeek

You are right, it's only the result of the reload-lua-config.

paddg avatar Aug 16 '22 11:08 paddg