AdGuardHome icon indicating copy to clipboard operation
AdGuardHome copied to clipboard

AGH stops processing uncached DNS lookups

Open thedannywahl opened this issue 2 years ago • 29 comments

Issue Details

  • Version of AdGuard Home server:
    • AdGuard Home v0.107.2
  • How did you install AdGuard Home:
    • Bash install script
  • How did you setup DNS configuration:
    • System
  • If it's a router or IoT, please write device model:
    • N/A
  • CPU architecture:
    • AMD64
  • Operating system and version:
    • Ubuntu 20.04.3 LTS

Expected Behavior

AGH should process all valid DNS requests

Actual Behavior

After the AGH service is started manually (sudo ./AdGuardHome -s start) or at system start it will eventually stop processing DNS requests for uncached entries.

This started occurring after I upgraded from 0.107.0 to 0.107.2 (skipping 0.107.1). Around the same time I started noticing that this would occur I also installed a number of system upgrades including systemd and the kernel (see apt log below).

When it stops processing there is no spike in system load, CPU utilization, or memory. Other services running on this server continue to function normally.

Manually stopping and starting the service or restarting the system resolve the error for a period until it starts happening again.

The error does not seem to be time-based but rather utilization based (n lookups), though I only have anecdotal evidence of this.

Additionally I only have minimal proof that this only affects uncached entries but these are the steps I've done to test it:

  • Start AGH service
  • visit a site that I know isn't in the AGH cache (I used chevrolet.com) in a normal browser window
  • wait for intermittent "server not found" errors in browsers indicating AGH has stopped processing requests
  • open a private browser window and attempt to navigate to chevrolet.com and it will load.
  • In the same private browser window attempt to view another uncached site (e.g. ford.com)
  • This will throw a "server not found" error.

I have attached the verbose log from the server here from boot to service restart, however it doesn't appear to capture anything interesting.

Screenshots

Screenshot:

Additional Information

AGH verbose version

Channel: release Go version: go1.16.12 Build time: 2021-12-29T19:55:46Z+0000 GOOS: linux GOARCH: amd64 Race: false Dependencies: github.com/AdguardTeam/[email protected] (sum: h1:nPMyvZUn1uSa5xVPSjJpZCDkn3+kLnNR7B3XTKk9tKs=) github.com/AdguardTeam/[email protected] (sum: h1:FBgk17zf35ESVWQKIqEUiqqB2bDaCBC8X5vMU760yB4=) github.com/AdguardTeam/[email protected] (sum: h1:dP6S7J6eFAk8MN4IDpUq2fZoBo8K8fmc6pXpxNIv84M=) github.com/NYTimes/[email protected] (sum: h1:ZUDjpQae29j0ryrS0u/B8HZfJBtBQHjqw2rQ2cqUQ3I=) github.com/aead/[email protected] (sum: h1:KjTM2ks9d14ZYCvmHS9iAKVt9AyzRSqNU1qabPih5BY=) github.com/aead/[email protected] (sum: h1:52m0LGchQBBVqJRyYYufQuIbVqRawmubW3OFGqK1ekw=) github.com/ameshkov/dnscrypt/[email protected] (sum: h1:X9UP5AHtwp46Ji+sGFfF/1Is6OPI/SjxLqhKpx0P5UI=) github.com/ameshkov/[email protected] (sum: h1:Srzik+J9mivH1alRACTbys2xOxs0lRH9qnTA7Y1OYVo=) github.com/beefsack/[email protected] (sum: h1:M57m0xQqZIhx7CEJgeLSvRFKEK1RjzRuIXiA3HfYU7g=) github.com/cheekybits/[email protected] (sum: h1:uGGa4nei+j20rOSeDeP5Of12XVm7TGUd4dJA9RDitfE=) github.com/digineo/go-ipset/[email protected] (sum: h1:k6skY+0fMqeUjjeWO/m5OuWPSZUAn7AucHMnQ1MX77g=) github.com/fsnotify/[email protected] (sum: h1:mZcQUHVQUQWoPXXtuf9yuEXKudkV2sx1E06UadKWpgI=) github.com/go-ping/[email protected] (sum: h1:mdi6AbCEoKCA1xKCmp7UtRB5fvGFlP92PvlhxgdvXEw=) github.com/google/[email protected] (sum: h1:Khx7svrCpmxxtHBq5j2mp/xVjsi8hQMfNLvJFAlrGgU=) github.com/google/[email protected] (sum: h1:ves8RnFZPGiFnTS0uPQStjwru6uO6h+nlr9j6fL7kF8=) github.com/google/[email protected] (sum: h1:Lh/jXZmvZxb0BBeSY5VKEfidcbcbenKjZFzM/q0fSeU=) github.com/insomniacslk/[email protected] (sum: h1:NpTIlXznCStsY88jU+Gh1Dy5dt/jYV4z4uU8h2TUOt4=) github.com/josharian/[email protected] (sum: h1:uhL5Gw7BINiiPAo24A2sxkcDI0Jt/sqp1v5xQCniEFA=) github.com/kardianos/[email protected] (sum: h1:bGuZ/epo3vrt8IPC7mnKQolqFeYJb7Cs8Rk4PSOBB/g=) github.com/lucas-clemente/[email protected] (sum: h1:ToR7SIIEdrgOhgVTHvPgdVRJfgVy+N0wQAagH7L4d5g=) github.com/marten-seemann/[email protected] (sum: h1:xbHbOGGhrenVtII6Co8akhLEdrawwB2iHl5yhJRpnco=) github.com/mdlayher/[email protected] (sum: h1:lez6TS6aAau+8wXUP3G9I3TGlmPFEq2CTxBaRqY6AGE=) github.com/mdlayher/[email protected] (sum: h1:n3ARR+Fm0dDv37dj5wSWZXDKcy+U0zwcXS3zKMnSiT0=) github.com/mdlayher/[email protected] (sum: h1:InctQoB89TIkmgIFQeIL4KXNvWc1iebQXdZggqPSwL8=) github.com/miekg/[email protected] (sum: h1:g5fRIhm9nx7g8osrAvgb16QJfmyMsyOCb+J7LSv+Qzk=) github.com/patrickmn/[email protected]+incompatible (sum: h1:HRMgzkcYKYpi3C8ajMPV8OFXaaRUnok+kx1WdO15EQc=) github.com/pkg/[email protected] (sum: h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=) github.com/satori/[email protected] (sum: h1:0uYX9dsZ2yD7q2RtLRtPSdGDWzjeM3TbMJP9utgA0ww=) github.com/ti-mo/[email protected] (sum: h1:rTN1nBYULDmMfDeBHZpKuNKX/bWEXQUhe02a/10orzg=) github.com/u-root/[email protected]+incompatible (sum: h1:u+KSS04pSxJGI5E7WE4Bs9+Zd75QjFv+REkjy/aoAc8=) go.etcd.io/[email protected] (sum: h1:/ecaJf0sk1l4l6V4awd65v2C3ILy7MSj+s/x1ADCIMU=) golang.org/x/[email protected] (sum: h1:0es+/5331RGQPcXlMfP+WrnIIS6dNnNRe0WB02W0F4M=) golang.org/x/[email protected] (sum: h1:hEYJvxw1lSnWIl8X9ofsYMklzaDs90JI2az5YMd4fPM=) golang.org/x/[email protected] (sum: h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ=) golang.org/x/[email protected] (sum: h1:fLOSk5Q00efkSvAm+4xcoXD+RRmLmmulPn5I3Y9F2EM=) golang.org/x/[email protected] (sum: h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=) gopkg.in/natefinch/[email protected] (sum: h1:1Lc07Kr7qY4U2YPouBjpCLxpiyxIVoxqXgkXLknAOE8=) gopkg.in/[email protected] (sum: h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=) howett.net/[email protected] (sum: h1:QDxUo/w2COstK1wIBYpzQlHX/NqaQTcf9jyz347nI58=)

Verbose CPU version

processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Genuine Intel(R) CPU U2300 @ 1.20GHz

verbose OS info

NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.3 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

verbose system info

Static hostname: plex Icon name: computer-desktop Chassis: desktop Machine ID: 97ff6784cf834e2093ec0329f069d558 Boot ID: 93ef0583297049bcb538bcf0cbbfa719 Operating System: Ubuntu 20.04.3 LTS Kernel: Linux 5.4.0-92-generic Architecture: x86-64

OS Update (apt) log

Start-Date: 2021-12-18 06:50:10 Commandline: /usr/bin/unattended-upgrade Upgrade: linux-firmware:amd64 (1.187.20, 1.187.23) End-Date: 2021-12-18 06:52:45

Start-Date: 2022-01-05 03:46:44 Commandline: apt-get upgrade Requested-By: plex (1000) Upgrade: libsystemd0:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), udev:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), libudev1:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), systemd-timesyncd:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), python3-commandnotfound:amd64 (20.04.4, 20.04.5), systemd-sysv:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), libpam-systemd:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), systemd:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), libnss-systemd:amd64 (245.4-4ubuntu3.13, 245.4-4ubuntu3.14), command-not-found:amd64 (20.04.4, 20.04.5), linux-firmware:amd64 (1.187.23, 1.187.24) End-Date: 2022-01-05 03:50:47

Start-Date: 2022-01-05 04:00:35 Commandline: apt-get --with-new-pkgs upgrade Requested-By: plex (1000) Install: linux-image-5.4.0-92-generic:amd64 (5.4.0-92.103, automatic), linux-modules-extra-5.4.0-92-generic:amd64 (5.4.0-92.103, automatic), linux-headers-5.4.0-92:amd64 (5.4.0-92.103, automatic), linux-headers-5.4.0-92-generic:amd64 (5.4.0-92.103, automatic), linux-modules-5.4.0-92-generic:amd64 (5.4.0-92.103, automatic) Upgrade: linux-headers-generic:amd64 (5.4.0.91.95, 5.4.0.92.96), linux-image-generic:amd64 (5.4.0.91.95, 5.4.0.92.96), linux-generic:amd64 (5.4.0.91.95, 5.4.0.92.96) End-Date: 2022-01-05 04:02:27

Start-Date: 2022-01-05 06:09:57 Commandline: /usr/bin/unattended-upgrade Remove: linux-image-5.4.0-90-generic:amd64 (5.4.0-90.101), linux-modules-extra-5.4.0-90-generic:amd64 (5.4.0-90.101), linux-modules-5.4.0-90-generic:amd64 (5.4.0-90.101) End-Date: 2022-01-05 06:10:05

Start-Date: 2022-01-05 06:10:12 Commandline: /usr/bin/unattended-upgrade Remove: linux-headers-5.4.0-90-generic:amd64 (5.4.0-90.101), linux-headers-5.4.0-90:amd64 (5.4.0-90.101) End-Date: 2022-01-05 06:10:17

DNS settings tls://1.1.1.1 https://dns.cloudflare.com/dns-query mode: load balanced bootstrap DNS: 1.1.1.1 EDNS: Enabled DNSSEC: Enabled Blocking mode: Default Cache size: 4194304 Optimistic caching: Enabled

aghlog.txt

thedannywahl avatar Jan 06 '22 13:01 thedannywahl

INITIAL POST Bumping this up, AGH would randomly "hang" after serving DNS requests for a short time. Restarting the service fixes it, until it happens again. Using v0.107.2.

UPDATE 2022-01-10 When the following lines shows up, DNS resolution stops and the service needs to be restarted.

Mon Jan 10 21:41:29 2022 daemon.err AdGuardHome[965]: 2022/01/10 13:41:29.311061 [info] error handling UDP packet: dns: buffer size too small
Mon Jan 10 21:43:54 2022 daemon.err AdGuardHome[965]: 2022/01/10 13:43:54.186445 [info] error handling UDP packet: dns: buffer size too small
Mon Jan 10 21:43:57 2022 daemon.err AdGuardHome[965]: 2022/01/10 13:43:57.208571 [info] error handling UDP packet: dns: buffer size too small

UPDATE 2022-01-11 I reverted to v0.107.0, this issue is not happening.

SiNONiMiTY avatar Jan 07 '22 17:01 SiNONiMiTY

Hello, and apologies for the late response! We cannot reproduce the issue, unfortunately. Can you enable verbose logging and look, what kind of logs are printed when this happens? Or send the logs to [email protected]? Thanks!

ainar-g avatar Jan 20 '22 15:01 ainar-g

@thedannywahl Hi! Is this issue still relevant?

Birbber avatar Sep 01 '22 13:09 Birbber

Yes, I have the same issue (see #1777 )

chaelli avatar Oct 24 '22 15:10 chaelli

I also have this issue, as well as not being able to reproduce. I have no issues regarding freezes or stops of request processing though.

Excerpt of log is attached. agh_git.log

0fbcb238c0 avatar Nov 10 '22 01:11 0fbcb238c0

I have same thing, I guess. packing udp packet: dns: buffer size too small

0.107.19

Churator avatar Nov 29 '22 20:11 Churator

Looks like this isn't fixed yet. I am running adgaurd current version: 4.7.6 on

Home Assistant 2023.1.5 Supervisor 2022.12.1 Operating System 9.4

Checked and net.core.rmem_max is set to 4194304

I have't noticed any freezing etc but below in the logs.

2023/01/19 07:14:09.827890 [error] unpacking udp packet: dns: buffer size too small 2023/01/19 07:14:12.762781 [error] unpacking udp packet: dns: buffer size too small 2023/01/19 08:06:10.624769 [error] unpacking udp packet: dns: buffer size too small 2023/01/19 08:06:13.639789 [error] unpacking udp packet: dns: buffer size too small

nigam-sa avatar Jan 19 '23 09:01 nigam-sa

Well, no new release, so it makes sense :)

Churator avatar Jan 19 '23 09:01 Churator

Same issue here. Have increased the UDP buffer with no effect:

cat /proc/sys/net/core/rmem_max 2500000

JulAlx avatar Jan 23 '23 07:01 JulAlx

i have upgraded to recently released version 4.8.0 and my net.core.rmem_max is set to 4194304. Still I am having the same errors in logs everyday. @frenck - Is it possible for you to look at this please?

nigam-sa avatar Jan 23 '23 12:01 nigam-sa

I've got the same errors in log. Is there any way to correct it ? Here the command's result some launched :

/opt/adguardhome/work # cat /proc/sys/net/core/rmem_max
212992

MilesTEG1 avatar Feb 07 '23 22:02 MilesTEG1

Got the same errors in log

v0.107.22 AdGuard Home addon for Home Assistant v4.8.0

[error] unpacking udp packet: dns: buffer size too small

imisic avatar Feb 18 '23 20:02 imisic

What's going on, why no one check/fix this ? Still stuck on .18

Churator avatar Feb 21 '23 12:02 Churator

Same error with v0.107.34

[error] dnsproxy: unpacking udp packet: dns: buffer size too small

TheWiresharkGuy avatar Jul 24 '23 15:07 TheWiresharkGuy

Same error with v0.107.36 running as Docker installation on a Synology NAS

[error] dnsproxy: unpacking udp packet: dns: buffer size too small

OlliZabal avatar Aug 16 '23 15:08 OlliZabal

We still cannot reproduce the dns: buffer size too small error. It comes from the github.com/miekg/dns library, so it is either a malformed packet or perhaps there is a bug in the library. Either way, it shouldn't interfere with the normal functioning of AGH. If you are sure that it does, please send the verbose logs of the AGH run that stops processing after these errors.

ainar-g avatar Aug 16 '23 16:08 ainar-g

We still cannot reproduce the dns: buffer size too small error. It comes from the github.com/miekg/dns library, so it is either a malformed packet or perhaps there is a bug in the library. Either way, it shouldn't interfere with the normal functioning of AGH. If you are sure that it does, please send the verbose logs of the AGH run that stops processing after these errors.

I can reproduce it easily but indeed did not notice functional impact, however since AGH is running on my router (OpenWRT), these logs are spamming the syslog and make it unusable.

I am wondering if I can provide additional data to help pinpoint the problem, I can run tcpdump directly on the router and log all packets if it helps, another alternative is to disable this log message.

Is there any issue for this upstream (github.com/miekg/dns)?

Thanks

thecode avatar Aug 16 '23 16:08 thecode

I can reproduce it easily but indeed did not notice functional impact, however since AGH is running on my router (OpenWRT), these logs are spamming the syslog and make it unusable.

We can log this exact error with [debug] level, I think. I feel, though, that if it really is a bad query, users should probably know.

I am wondering if I can provide additional data to help pinpoint the problem, I can run tcpdump directly on the router and log all packets if it helps, another alternative is to disable this log message.

If you could get the dump of the offending request, that'd be great.

Also, are these coming from the same client? Perhaps it's a known issue with that device?

Is there any issue for this upstream (github.com/miekg/dns)?

There are a few roughly matching, but without the exact data it's hard to tell.

ainar-g avatar Aug 16 '23 16:08 ainar-g

@ainar-g Is there any log I can provide to help investigate this issue? I'm currently on v0.107.38 and the issue is severe that I got SERVFAIL (on my network clients) every 20 - 30 minutes... My AdGuardHome.log are filled with

2023/09/12 01:14:43.541289 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:14:46.440923 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:15:07.505015 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:15:10.504589 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:20:05.670015 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:20:08.681314 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:20:25.123012 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:20:28.140820 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:23:28.397321 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
2023/09/12 01:23:31.500445 [error] dnsproxy: unpacking udp packet: dns: buffer size too small

My usage is typically 280000+ queries in 24 hours, on Raspberry Pi 400. Query method is a mix of plain DNS & DoH (through Nginx reverse proxy with HTTP/3). Pure home use.

cat /proc/sys/net/core/rmem_max
100000000

szhu25 avatar Sep 12 '23 05:09 szhu25

@szhu25, all of the steps I've outlined in the previous comment are still valid, in addition to collecting verbose logs, especially of the exact moment the issue occurs.

ainar-g avatar Sep 12 '23 13:09 ainar-g

@szhu25, all of the steps I've outlined in the previous comment are still valid, in addition to collecting verbose logs, especially of the exact moment the issue occurs.

Sounds good! I'll go home,collect logging for 48 hours and see where that goes! Thanks

szhu25 avatar Sep 12 '23 13:09 szhu25

I get this on asuswrt-merlin, interesting that another user with the same problems is on a openwrt router and the people who can't produce it aren't on routers perhaps? Something in the router firmware?

cat /proc/sys/net/core/rmem_max
4194304
cat /proc/sys/net/core/wmem_max
1048576

I don't use a QUIC DNS server in my entries.

privacyguy123 avatar Oct 17 '23 16:10 privacyguy123

I get this on asuswrt-merlin, interesting that another user with the same problems is on a openwrt router and the people who can't produce it aren't on routers perhaps? Something in the router firmware?

cat /proc/sys/net/core/rmem_max
4194304
cat /proc/sys/net/core/wmem_max
1048576

I don't use a QUIC DNS server in my entries.

I think it might be due to some weird limitations on ARM-based CPUs. I also have external AdGuard Home resolvers on Intel & AMD CPUs and haven't experienced this issue on those servers. This is a speculation and I have no way to prove it.

szhu25 avatar Oct 17 '23 17:10 szhu25

I get this on asuswrt-merlin, interesting that another user with the same problems is on a openwrt router and the people who can't produce it aren't on routers perhaps? Something in the router firmware?

cat /proc/sys/net/core/rmem_max
4194304
cat /proc/sys/net/core/wmem_max
1048576

I don't use a QUIC DNS server in my entries.

I think it might be due to some weird limitations on ARM-based CPUs. I also have external AdGuard Home resolvers on Intel & AMD CPUs and haven't experienced this issue on those servers. This is a speculation and I have no way to prove it.

Has to be something along those lines ...

privacyguy123 avatar Oct 17 '23 17:10 privacyguy123

I dont't think that this issue is limited to ARM-based CPUs as in my case I'm running AGH as Docker installation on a Synology DS720plus. This unit has an Intel Celeron J4125 CPU.

FYI: I do use TLS and HTTPS DNS server in my entries.

OlliZabal avatar Oct 20 '23 12:10 OlliZabal

I have an Intel Xeon E5620 as well and this issue happens consistently when the number of requests increases. Been rocking these insane numbers and still happens (when the number is higher, the time it takes to appear is longer though).

cat /proc/sys/net/core/rmem_max

200000000

cat /proc/sys/net/core/wmem_max

200000000

As @OlliZabal, I could reproduce this using https and tls upstreams, but also QUIC and h3. I set use_http3_upstreams: false to avoid HTTPS being upgraded to HTTPS/3, but I still get this.

However, in a Raspberry Pi 4, I never saw that line logged 😉, so definitely not processor-related.

I'm still trying to collect useful logs for this.

ferferga avatar Nov 21 '23 09:11 ferferga

I discovered an interesting thing: so far, it looks like the issue is caused by 2 devices, one is S21 FE and the other is S20 FE. Consistently...

ferferga avatar Nov 21 '23 11:11 ferferga

I discovered an interesting thing: so far, it looks like the issue is caused by 2 devices, one is S21 FE and the other is S20 FE. Consistently...

This is interesting since I also suspected this is caused buy S20, I notice the errors mostly when S20 device is connected to the network, however I am not sure this is the only problem as I did see errors also when the S20 is not connected

thecode avatar Nov 21 '23 12:11 thecode

I discovered an interesting thing: so far, it looks like the issue is caused by 2 devices, one is S21 FE and the other is S20 FE. Consistently...

Same here, noticed this error right after an S10 joining the network:

Feb 10 20:30:01 router dnsmasq-dhcp[381354]: not giving name sgs10.mydomain to the DHCP lease of 192.168.2.26 because the name exists in /etc/hosts with address 192.168.2.6
Feb 10 20:30:02 router dnsmasq-dhcp[381354]: not giving name sgs10 to the DHCP lease of 192.168.2.26 because the name exists in /etc/hosts with address 192.168.2.6
Feb 10 20:30:05 router AdGuardHome[4540]: 2024/02/10 20:30:05.448277 [error] dnsproxy: unpacking udp packet: dns: buffer size too small
Feb 10 20:30:08 router AdGuardHome[4540]: 2024/02/10 20:30:08.372161 [error] dnsproxy: unpacking udp packet: dns: buffer size too small

AdGuardHome 1.7.4 release Asuswrt-Merlin firmware

crazy-matt avatar Feb 10 '24 20:02 crazy-matt

@crazy-matt is this still an issue with the most recent version?

overwatch3560 avatar Apr 02 '24 00:04 overwatch3560