web
web copied to clipboard
Adding/Deleting multiple Custom DNS Records in a short amount of time triggers systemd service StartLimitBurst leading to disable pihole-FTL service
Versions
- Pi-hole: v5.17.1
- AdminLTE: v5.20.1
- FTL: v5.23
Platform
- OS and version: Ubuntu 22.04.2 LTS (5.15.0-73-generic)
- Platform: Proxmox-VM
Expected behavior
Adding or Deleting custom DNS records through webadmin GUI or respectively directly through POST requests to the API endpoint at /admin/scripts/pi-hole/php/customdns.php causes pihole-FTL to just be reloaded.
Actual behavior / bug
repeatedly adding or deleting DNS records through the web interface (/admin/scripts/pi-hole/php/customdns.php) causes pihole-FTL to be restarted (not reloaded) with each and every DNS record added. This leads to systemd hitting the StartLimitBurst=5 configured in /etc/systemd/system/pihole-FTL.service leading to consecutive restarts of the service to fail.
systemctl status pihole-FTL:
× pihole-FTL.service - Pi-hole FTL
Loaded: loaded (/etc/systemd/system/pihole-FTL.service; enabled; vendor preset: enabled)
Active: failed (Result: start-limit-hit) since Fri 2023-06-09 19:26:53 UTC; 17s ago
Process: 3209 ExecStartPre=/opt/pihole/pihole-FTL-prestart.sh (code=exited, status=0/SUCCESS)
Process: 3222 ExecStart=/usr/bin/pihole-FTL -f (code=exited, status=0/SUCCESS)
Process: 3245 ExecStopPost=/opt/pihole/pihole-FTL-poststop.sh (code=exited, status=0/SUCCESS)
Main PID: 3222 (code=exited, status=0/SUCCESS)
CPU: 165ms
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:52.992 3222M] Resizing "FTL-queries" from 4587520 to (86016 * 56) == 4816896 (/dev/shm: 5.1MB used, 4.2GB total, FTL uses 5.1MB)
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:52.997 3222M] Resizing "FTL-queries" from 4816896 to (90112 * 56) == 5046272 (/dev/shm: 5.4MB used, 4.2GB total, FTL uses 5.3MB)
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:52.001 3222M] Resizing "FTL-queries" from 5046272 to (94208 * 56) == 5275648 (/dev/shm: 5.6MB used, 4.2GB total, FTL uses 5.6MB)
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:53.007 3222M] Resizing "FTL-queries" from 5275648 to (98304 * 56) == 5505024 (/dev/shm: 5.8MB used, 4.2GB total, FTL u
Jun 09 19:26:53 pihole01 systemd[1]: Stopping Pi-hole FTL...
Jun 09 19:26:53 pihole01 systemd[1]: pihole-FTL.service: Deactivated successfully.
Jun 09 19:26:53 pihole01 systemd[1]: Stopped Pi-hole FTL.
Jun 09 19:26:53 pihole01 systemd[1]: pihole-FTL.service: Start request repeated too quickly.
Jun 09 19:26:53 pihole01 systemd[1]: pihole-FTL.service: Failed with result 'start-limit-hit'.
Jun 09 19:26:53 pihole01 systemd[1]: Failed to start Pi-hole FTL.
Steps to reproduce
Steps to reproduce the behavior:
Shell
- run this shell script
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test1.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test2.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test3.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test4.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test5.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test6.local
- systemctl status pihole-FTL will now be
killedand fail to start withFailed with result 'start-limit-hit'
Webadmin
- Add or delete 6 DNS records in under 60 seconds.
- systemctl status pihole-FTL will now be
killedand fail to start withFailed with result 'start-limit-hit'
Debug Token
I don't think this should be necessary here.
Additional context
I am using the pihole provider from kubernetes-sigs/external-dns in combination with borchero/switchboard to automatically add DNS records for services in my cluster when they get exposed. This worked fine while setting everything up.
However, after exposing more services for external-dns to manage DNS for, pihole-FTL started to behave oddly/refused to start.
The way kubernetes-sigs/external-dns adds the dns records, is by just simply sending POST requests to the same endpoint the webadmin GUI uses (/admin/scripts/pi-hole/php/customdns.php). Due to this API endpoint not offering any update functionality and also the fact that no TXT records are possible, this results in a little bit of a "spammy" behaviour from external-dns. As far as i understand the implementation of the pihole provider on external-dns, there are a couple workarounds in place, due to the API limitations of pihole-FTL. There might also be another bug in the pihole provider for external-dns, but I have not yet spent the time to dig into that side any deeper and I believe that even if this was the case, it would not change validity of this bug report.
For a quick and dirty workaround I adjusted StartLimitBurst in /etc/systemd/system/pihole-FTL.service.
When digging through the source code of pihole I noticed a couple things and I am not exactly sure where this should be fixed.
-
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L223 empty string as default value for the
reloadargument. -
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280 effectively results in
pihole -a addcustomdns 127.0.0.1 test1.local(notice neithertruenorfalseare part of the argument for restart) -
https://github.com/pi-hole/pi-hole/blob/6a45c6a8e027e1ac30d4556a88f31684bc80ccf1/pihole#L579
piholeshell script defers toAddCustomDNSAddressinwebpage.sh -
https://github.com/pi-hole/pi-hole/blob/6a45c6a8e027e1ac30d4556a88f31684bc80ccf1/advanced/Scripts/webpage.sh#L719-L743 RestartDNS command will be issued due to missing empty reload argument
-
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L284 RestartDNS triggered again. Should this be
restartdns reload-listsinstead of justrestartdnshere?
Conclusion
The issue here seems to affect both the AdminLTE as well as the pihole repository, so I am not entirely sure how the best or preferred way of fixing this would look like.
Thanks for your detailed analysis, you included all relevant aspects. However, I'm not sure if this is really a bug or not. Pivot is the
StartLimitBurstin/etc/systemd/system/pihole-FTL.service
This was added to prevent endless-start-stop cycles in case of an error. Adding a custom DNS record needs a full FTL restart to take effect. However, if you know you're going to add a lot in a row, $reload can help to postpone the restart until everything is added. This is what we do when we import a (teleporter) backup (see https://github.com/pi-hole/AdminLTE/pull/2519). I'm not familiar with the tools you use to add the records, but as the access the API endpoints directly they always trigger the restart immediately.
I have no good idea how to solve (as in balance) between preventing to-many-restarts and being able to (manually) adding a lot of custom DNS records (other than increasing the limit). I think no one of us thought about adding >5 records manually within 60 seconds.
@DL6ER
I guess the same issue will happen in v6?
thanks for your quick reply!
I initially thought that a pihole restartdns reload-lists would be enough, but I seem to have missed that during my testing earlier.
However, as far as I can tell it is enough to issue a systemctl reload, which does not cause any issues with the service, even if triggered many times in quick succession.
- try to resolv custom DNS record that does not exist
# nslookup testing-01.local localhost
Server: localhost
Address: 127.0.0.1#53
** server can't find testing-01.local: NXDOMAIN
- add the record using cmd utility with reload=false
# pihole -a addcustomdns 127.0.0.1 testing-01.local false
[✓] Adding custom DNS entry...
- record still not in effect
# nslookup testing-01.local localhost
Server: localhost
Address: 127.0.0.1#53
** server can't find testing-01.local: NXDOMAIN
- i initially thought that restartdns reload-lists would do the job, but:
# pihole restartdns reload-lists
[✓] Reloading DNS lists
# nslookup testing-01.local localhost
Server: localhost
Address: 127.0.0.1#53
** server can't find testing-01.local: NXDOMAIN
- However, if we reload the service using systemctl:
# systemctl reload pihole-FTL
- the record is active now
# nslookup testing-01.local localhost
Server: localhost
Address: 127.0.0.1#53
Name: testing-01.local
Address: 127.0.0.1
So, just reloading the service with systemctl seems to be enough and doesn't seem to trigger any issues with the service:
- StartLimitBurst is set to the default value of 5
# systemctl cat pihole-FTL | grep StartLimitBurst
StartLimitBurst=5
- reload the service 100 times
# for i in $(seq 1 100); do systemctl reload pihole-FTL; done
- service still in active state
# systemctl status pihole-FTL
● pihole-FTL.service - Pi-hole FTL
Loaded: loaded (/etc/systemd/system/pihole-FTL.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-06-09 21:19:01 UTC; 1min 43s ago
Process: 7317 ExecStartPre=/opt/pihole/pihole-FTL-prestart.sh (code=exited, status=0/SUCCESS)
Process: 7450 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 7330 (pihole-FTL)
Tasks: 19 (limit: 9401)
Memory: 10.4M
CPU: 279ms
CGroup: /system.slice/pihole-FTL.service
└─7330 /usr/bin/pihole-FTL -f
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Your analysis in not correct here:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280 effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)
When needed, $reload will be set to false on the lines just above the code you posted:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L274-L281
Resulting in: pihole -a addcustomdns 127.0.0.1 test1.local false
Your analysis in not correct here:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280
effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)
When needed,
$reloadwill be set tofalseon the lines just above the code you posted:https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L274-L281
Resulting in:
pihole -a addcustomdns 127.0.0.1 test1.local false
you are correct, thanks for pointing that out! and sorry for the confusing!
However, as far as I can tell it is enough to issue a systemctl reload, which does not cause any issues with the service, even if triggered many times in quick succession.
I think you are correct. The reload triggers the read of custom.list by dnsmasq
Jun 9 23:35:39 dnsmasq[3578109]: read /etc/hosts - 7 names
Jun 9 23:35:39 dnsmasq[3578109]: read /etc/pihole/custom.list - 22 names
Jun 9 23:35:39 dnsmasq[3578109]: read /etc/pihole/local.list - 0 names
Note: this will only work for the custom DNS records, not custom CNAME records (as files in /etc/dnsmasq.d/ are not read by reload)
Please try if
pihole checkout core no_reload fixes the issue for you.
Please try if
pihole checkout core no_reloadfixes the issue for you.
not quite, but when I additionally add the reload to the func.php as well, it seems to be working:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L284
thanks!
Good catch.
We did not plan to release any new v5 version and focus on v6 - however the changes necessary here are trivial after the bug was dissected. We'll discuss internally how to proceed.
I guess the same issue will happen in v6?
No. v6 tries to be "more clever" than many of the elements we have in v5. This involves trying to minimize restarting of FTL to as seldom as possible. Custom DNS records are a prime example. When I coded this part of the v6 interface, I very much disliked that the DNS cache is completely flushed by a restart/reload so I changed two things: Firstly, this file is now in a watched directors (hostsdir) so FTL finds itself out when the file is changed (no need to send a signal at all). Secondly, I submitted a patch upstream into dnsmasq (it is already included in the current release of dnsmasq) that ensures the DNS cache is not completely flushed but only those elements from the updated list are removed (and then repopulated).
After internal discussion we decided to not release a new v5 version. Reasons are
- issue exists a really long time - this is the first time someone noticed it
- it's fixed for the affected user
- it's fixed in v6 already.
I have posted a workaround for v5 in this thread https://discourse.pi-hole.net/t/ftl-crashes-repeatedly-when-updating-dns-records-through-external-dns/66867