cache hit ratio and upstream query latency
I have been running routedns for over a week now and coming from blocky, the average response time is higher. I am not sure if that is because my configuration is sub optimal.
At the moment I have both blocky and routedns running with AdGuard Home using them as upstream. Looking at average upstream stats for last 24 hours, routedns is 63ms and blocky is 27ms. Blocky handled 68.9% of the queries and routedns did 31.1%. Due to lower average response times, AGH prefers blocky over routedns. Previously running 2 blocky instances, the split was roughly even.
Both of them have the exact same DoH upstreams. RouteDNS has bootstrap address configured whereas block doesn't.
Caching & prefetch
One of the things I have been noticing is that, there is a lot more cache misses with routedns compared to blocky and it is causing the averages to shoot up.
Blocky is configured to prefetch any query that gets more than 5 hits in a 3 hour window with maximum of 250 total items it can keep for prefetching before dropping the oldest one. It also sets the minimum TTL and maximum TTL.
caching:
minTime: 2m
maxTime: 24h
maxItemsCount: 1000
prefetching: true
prefetchExpires: 3h
prefetchThreshold: 5
prefetchMaxItemsCount: 250
cacheTimeNegative: 1m
I have tried to match it in routedns to some extent. I am aware routedns will prefetch everything unlike blocky but it doesn't appear to be working that well from what I can see based on response times for queries. Especially on repeated queries where the gap between each is longer than 5 minutes.
[groups.stub-resolver-cache]
type = "cache"
resolvers = ["stub-updated-ttl"]
cache-answer-shuffle = "random"
cache-flush-query = "flush.cache."
cache-prefetch-trigger = 10
cache-prefetch-eligible = 20
cache-rcode-max-ttl = { 3 = 60, 5 = 60 }
backend = { type = "memory", size = 1000, filename = "/tmp/stub-resolver-cache.json", save-interval = 300 }
[groups.stub-updated-ttl]
type = "ttl-modifier"
resolvers = ["doh"]
ttl-min = 120
ttl-max = 86400
Upstream query latency
So for cache misses, blocky seems to perform a bit better than routedns. For example, blocky would mostly do about 20ish ms for upstream queries compared to routedns which is about 30ish ms on average.
Blocky doesn't support QUIC so it is standard DoH with TLS 1.2 as minimum and upstream timeout set to 2s. Everything is else is left to defaults.
minTlsServeVersion: 1.2
upstreams:
timeout: 2s
In RouteDNS I have configured the following
- Timeout set to 2 seconds
- quic & O-RTT enabled for cloudflare & Google upstreams.
RouteDNS has the advantage of not having to resolve the records as bootstrap address is configured. The average response time is higher if you don't specify bootstrap address so it did help bring down the average a bit.
I also did bump the hard coded timeout from 30 to 90 seconds and compiled it. It seems to have made a slight difference but not much.
As I am not familiar with Go, not sure what blocky does differently such that the latency is a bit lower for querying upstream.
RouteDNS is a great tool and thank you for making it. I love the flexibility it gives on what you can do with processing pipelines. If caching can be improved and query latency can be reduced, it would make it even more great. I would love to drop blocky and have RouteDNS do everything.
Did you try playing with the shuffle option? I always do not define it so all answers are in the same order, or order it was resolved. This might influence caching at your front end?
wrt the upstream query latency, make sure you use the GET method in doh-over-quic like so:
[resolvers.cloudflare-doh-quic]
address = "https://cloudflare-dns.com/dns-query{?dns}"
doh = { method = "GET" }
protocol = "doh"
transport = "quic"
enable-0rtt = true
This should give you 0RTT
As for prefetching, I suspect that Blocky uses a different method to keep items "hot" in the cache. RouteDNS uses a very basic algorithm, similar to what Bind does. Blocky keeps track of how many times a query was sent and then actively fetches the ones above a threshold, while routedns only refreshes items that are actively being queried and the TTL has fallen below 10s (in your example). This likely explains the difference in averages.
@cbuijs I did try that and didn't see any noticeable difference
@folbricht That explains it. I have noticed it mostly in IOT devices and Apple devices where they query a record that mostly has TTL of 60 seconds and then not query for 5 to 15 minutes before querying again. With RouteDNS, this would always cause it to miss cache always but with blocky as it keeps track, after minimum threshold is reached it will start prefetching.
The prefetching logic seems to be in this file https://github.com/0xERR0R/blocky/blob/main/cache/prefetching/prefetching_cache.go
Blocky developer also recently moved it to a separate library. It is licensed under MPL 2.0 so not sure if routedns could leverage it https://github.com/0xERR0R/expiration-cache
Implementing that kind of prefetch algorithm isn't actually too difficult. My main concern is about how to integrate it without significantly impacting existing users using the current configuration options. I could try to re-use the current options and give them slightly different meanings but want to limit impact. Thought on what the new config could look like, using the existing options, perhaps with some new ones and reasonable defaults?
I was thinking of two ways to implement this, one is to keep the pre-fetching within cache block like how it is currently but introduce a new parameter to define the algorithm. It will default to Bind like if not specified.
# Current default (BIND-like, static thresholds)
cache-prefetch-mode = "time" # optional, will default to this if not specified
cache-prefetch-trigger = 30
cache-prefetch-eligible = 10
# Alternative (Blocky-like, demand-driven)
cache-prefetch-mode = "demand"
# below parameters can be optional with sensible defaults
cache-prefetch-hits-threshold = 5
cache-prefetch-window = "2h"
cache-prefetch-maxitems = "100"
Blocky defaults to 5 hits within 2 hour window. Maximum items is set to no limit but I think we shouldn't go for no limit as a default. Something like 50 or 100 should be a good start when running it in low powered devices.
The other way I was thinking is to move pre-fetch to its own block. This allows adding different algorithms for pre-fetching in the future should we require just like how we gave random, round-robin, fail over groups.
One complication I can think of this approach is that, the resource would have no idea about the cache to update so it would need to be passed as a parameter
[groups.prefetch]
type = "prefetch"
mode = "demand"
prefetch-hits-threshold = 5
prefetch-window = "2h"
prefetch-maxitems = "100"
resolvers = ["cloudflare-doh"]
It is just an example, so feel free to pick parameter names that you see fit.
Took a closer look at the options and agree that a standalone "prefetch" module is likely the best way to implement this behavior. The cache supports using external Redis and adapting to that would be hard.
@emlimap I implemented a new prefetch group in a somewhat similar way as Blocky. Would you be able to try it out? It's on the issue-463 branch currently. Docs are still missing. It's configured like this:
[resolvers.cloudflare-dot]
address = "1.1.1.1:853"
protocol = "dot"
[groups.cloudflare-cached]
type = "cache"
resolvers = ["cloudflare-dot"]
[groups.cloudflare-prefetch]
type = "prefetch"
resolvers = ["cloudflare-cached"]
prefetch-window = "15m" # Minimum time between queries to remain eligible for prefetch
prefetch-threshold = 3 # Min number of queries to enable prefetch
prefetch-max-items = 100 # Max number of items to track for prefetch
[listeners.local-udp]
address = "127.0.0.1:53"
protocol = "udp"
resolver = "cloudflare-prefetch"
First of all, thank you for taking the time to implement this feature
I have deployed the change and have reset the statistics in Adguard Home. Will let you know what it looks like after 24 hours of usage. At the moment, the average response time is 77ms.
It has made quite a difference. The average has dropped to under 40ms. The average was 77ms for first instance running on an Intel NUC and 104ms for second instance running on an underpowered Nano Pi R2s running openwrt.
It is still skewed by some less popular domains that aren't actively cached by public DNS servers. I also spread queries over 5 public DNS providers for privacy so that also doesn't help with this scenario
For domains that match prefetch threshold, response times are under 5ms. Which is just massive as it can be anywhere between 30 - 70ms.
Thank you again for implementing this feature
It works quite well. Running it for two days with no issues. I see some slight increase in speed of (cached) responses getting returned. Roughly in the 4 to 6 percent range.
While your change made a massive difference, I've been trying to track down additional latency sources. I discovered that some latency was due to using Redis for caching, which allows two instances of RouteDNS to share the same cache.
Since I'm not familiar with Go, I've been using AI to identify which components contribute to the overall latency of DNS queries. The AI has implemented additional optimizations that have reduced latency by 2-3 ms on average for cached queries. As a result, cached/pre-fetched queries now respond in as little as 2 ms.
I'm not sure if you accept AI-generated code, but I can submit it if you're interested in reviewing it.
Below are the changes are it made
- Redis flush/size correctness
- Flush: Replaces naive delete with prefix-scoped SCAN + batched UNLINK, 2 s timeout; skips flush when redis-key-prefix is empty; falls back to DEL if UNLINK unsupported.
- Size: Counts keys via SCAN MATCH on the configured prefix; falls back to DB-wide DBSIZE only if no prefix.
- Leaner value storage and IO
- Stores cache values in compact binary wire format (dns.Msg.Pack) with automatic JSON fallback for backward compatibility.
- Uses GET.Bytes() and performs serialization before starting the I/O deadline to reduce CPU and timeouts.
- Async miss-store (respond-first)
- Default ON for Redis: redis-async-set-on-miss (set to false to disable). Runs Redis SET in background so misses return faster.
- Bounded concurrency: small semaphore (256) caps concurrent async writes; if saturated, store is skipped (best-effort).
- TTL guard avoids writing already-expired items in async path.
- Minor optimizations and logging
- Key builder uses strings.Builder and strconv.FormatBool for the DO flag.
- Improved error logs when both binary decode and JSON fallback fail.
- Docs, config, examples
- Adds redis-async-set-on-miss to configuration docs (default true).
- Example Redis config omits the flag (default applies).
- PR notes updated with behavior, defaults, and operational caveats.
- Benefits
- Correct flush/size for shared Redis; safer multi-tenant behavior.
- Smaller payloads and lower CPU on hits; fewer “deadline exceeded” on store.
- Lower miss-path latency with respond-first writes; flush avoids blocking Redis.
Merged https://github.com/folbricht/routedns/pull/468
@emlimap Of course you're welcome to submit PRs. But if possible, try to keep them scoped to one or two things at a time. As for the list of proposed changes, I suspect redis-async-set-on-miss would be the most impactful, perhaps followed by using concurrency in Store() and using a binary format (ideally using PackBuffer while keeping a buffer in a sync.Pool). The changes to flush may not be worth it as that's rarely used so you shouldn't see any improvement from it.
@folbricht I have gone ahead and created those PRs. Please review them
Redis Async set on cache miss https://github.com/folbricht/routedns/pull/472 use Binary wire format https://github.com/folbricht/routedns/pull/473
There may be a merge conflict after first PR is merged, which I can address when it does