AdGuardHome icon indicating copy to clipboard operation
AdGuardHome copied to clipboard

Upstream latency increase since v0.107.55

Open RainmakerRaw opened this issue 1 year ago • 8 comments

Prerequisites

Platform (OS and CPU architecture)

Linux, ARM64

Installation

GitHub releases or script from README

Setup

On one machine

AdGuard Home version

0.107.55

Action

Replace the following command with the one you're calling or a description of the failing action:

N/A

Expected result

Upstream latency should be in line with my ISP WAN connection (FTTP/fibre), approx 12ms.

Actual result

Since upgrading to 0.107.55 the upstream latency has drifted, a fair amount. I am aware that the release is intended to address latency issues (eg #6818), ironically.

On previous versions (eg 0.107.54) my upstream latency across 24 hours is approximately 12ms-14ms for most upstreams. After upgrading to 0.107.55 the upstreams are between 17ms and 32ms. I have verified this multiple times, on the same home LAN with the same clients and (roughly) the same type of traffic. I have tested on my primary LAN DNS server (bare metal, Radxa Rock5B, Debian Bookworm aarch64), and also confirmed the same issue on my backup instance (Alpine Linux LXC on Proxmox, AMD Ryzen x86_64 host).

Downgrading to 0.107.54 immediately fixes the issue, and upstream latency returns to around 12ms. I have pprof running, and can switch versions and/or collect stats and logs as required. Many thanks in advance.

Additional information and/or screenshots

Here is a redacted copy of the AdGuardHome.yaml (extension changed to .txt for Github). AdGuardHome.txt

RainmakerRaw avatar Dec 19 '24 21:12 RainmakerRaw

Thank you for the report - I will pass it on to the devs. Can you confirm if you're using caching or not please? I've tried parsing your uploaded config (thank you) but can't spot it myself.

tjharman avatar Jan 12 '25 00:01 tjharman

The DNS cache, you mean?

  cache_size: 10000000
  cache_ttl_min: 0
  cache_ttl_max: 0
  cache_optimistic: true

RainmakerRaw avatar Jan 12 '25 11:01 RainmakerRaw

I have a similar issue but this isn't just related to the latest version. I believe it occured a couple of versions before. I have three addresses to which parallel requests are made:

tls://dns.google tls://1dot1dot1dot1.cloudflare-dns.com:853 tls://one.one.one.one:853

Bootstrap DNS addresses are:

1.1.1.1 1.0.0.1 2606:4700:4700::1111 2606:4700:4700::1001

With the above, the average upstream response time is 22ms for Google and 23/24ms for Cloudflare. Removing the Google address automatically and immediately increases the response time to 32ms.

Also, the display of the average upstream response time should be from the quickest at the top to the least quick one at the bottom. Currently, it's the other way round suggesting that the least quick one is the quickest.

EdinburghWG avatar Jan 13 '25 07:01 EdinburghWG

I am also here the same and back to v0.108.0-b.59 everything is running normally again it seems all version 5 causes latency increase version 49 is very good

pendie avatar Jan 14 '25 02:01 pendie

So, here's a screenshot of the response time after the Google DNS server was removed

Image

I'll upload another screenshot once I run the Google DNS with the others for some time.

What I don't understand is how the addition/removal of the Google DNS address changes the response time of the other two in that it increases. The expected behaviour would be to have the same, unless the response time is not calculated the way I think it is.

EdinburghWG avatar Feb 11 '25 12:02 EdinburghWG

i am using Version: 0.107.57 and i commented here that it seems to happen easily when using h3:// but rarely when using https:// (http/2).

right now i'm using google DoH with h3://, it's been 3+ hours and everything seems to work just fine. but i highly suspect it's something to do with how AGH handles query logs (General settings > Logs configuration). i disabled the query logs, leaving Statistics configuration enabled. disabled everything there except for the ones that i really needed.

image 1

Image

it seems after disabled the query logging, there's no ping spike. when query logging is enabled, it's easy to get 3000+ ms ping spikes.

in my case, AGH is running on a minipc, intel n100 (so it's not a weak arm cpu like most routers), and i edited the YAML to write the logs in memory instead of using disk. also edited it to write to /dev/shm just to make sure we're not using disk here for the logs. but no luck, at least until i disabled the Logs configuration feature.

it's probably something to do with how AGH rotates the logs?

i'm hoping anyone who encounters the same issue can check together so the devs can get more data. maybe it wasn't a network issue or a problem from the upstream servers but how AGH handles logs? (😐handling logs does need some compute, especially processing logs rapidly like DNS logs, so it's not impossible).

galpt avatar Oct 20 '25 11:10 galpt

I'm testing this, will report how it's gone.

MikeVil avatar Oct 25 '25 19:10 MikeVil

i want to follow up my previous comment https://github.com/AdguardTeam/AdGuardHome/issues/7515#issuecomment-3421669150, disabling the Query logs didn't actually solve anything.

highly doubt it's a server-side issue since when using h3:// it'll happen to mainstream providers (google, cloudflare, etc.), their DNS implementations were probably already stable since years ago.

another thing i suspected since months ago was how Go handles quic and http/3, but there are other projects using h3 (i.e. dnscrypt, routedns, etc.) and the last time i used them, the h3 worked fine for days. the ping spikes happened only when i use AGH.

both routedns & dnscrypt are written in Go, again, i doubt myself for suspecting how Go handles quic & h3. it's probably something else that's messing with AGH performance internally.

galpt avatar Oct 27 '25 10:10 galpt