smartdns seems NOT cache the query
Problem Description
SmartDNS does not appear to fully cache responses. When serving as the primary DNS for a large organization, we observe SmartDNS issuing nearly 1 k QPS upstream—roughly matching its total incoming QPS—despite caching being enabled.
Environment
- Firmware / Package: Debian package
smartdns 40+dfsg-1 - Carrier / Network: Internal enterprise network
- OS: Debian GNU/Linux 12 (bookworm) x86_64
- SmartDNS Source & Version: Installed from Debian official repositories, version
40+dfsg-1 - Relevant Configuration (sanitized):
rr-ttl-min 30 rr-ttl-max 360 rr-ttl-reply-max 30 cache-size 524288 speed-check-mode none max-query-limit 65535 prefetch-domain no serve-expired yes serve-expired-ttl 3600 serve-expired-reply-ttl 10 serve-expired-prefetch-time 30 dualstack-ip-selection no cache-persist no audit-enable yes audit-size 8G audit-num 8 log-level error log-size 8G log-num 8 domain-set -name blocked -file /etc/smartdns/domainlist.d/blacklist.conf group-match domain-set:blocked address /domain-set:blocked/#
Reproduction Steps
- Deploy SmartDNS with the above configuration as the organization’s primary DNS resolver.
- Generate client load of approximately 1 k QPS.
- Monitor upstream DNS traffic; for example:
You will see sustained very high RPS upstream despite cache settings.tail -f /var/log/named/queries-unicom.log | pv -l -i 1 > /dev/null
Audit Log Sample
[2025-04-17 16:37:23,993] 172.22.124.15 query as.xiaohongshu.com, type 1, time 0ms, speed: -0.1ms, result 101.34.191.138, 101.34.194.192
[2025-04-17 16:37:24,020] 10.0.142.2 query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms, speed: -0.1ms, result 43.137.231.211, 43.137.231.90, 43.137.231.60, 43.137.231.93, 43.137.231.217
Expected Behavior:
SmartDNS should serve cached responses for repeated queries and minimize upstream QPS.
Actual Behavior:
SmartDNS continues to forward nearly all queries upstream, as if caching is ineffective.
Thank you for your assistance.
Best regards,
Yuan
Additional Information We have also tested SmartDNS 1.2025.03.02-1533 (Release46-23-gb525170) and can reproduce the same behavior.
query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms
Here the time is 0ms, and it is very likely that the result is returned from cache.
You'd better turn on debugging and repeatedly query the same domain name to see if the cache is effective.
In addition, ensure that the cache size is big enough.
and you can increase ttl-min.
query pull-flv-l11-cny.douyincdn.com, type 1, time 0msHere the
time is 0ms, and it is very likely that the result is returned from cache. You'd better turn on debugging and repeatedly query the same domain name to see if the cache is effective.In addition, ensure that the cache size is big enough.
and you can increase ttl-min.
Thank you so much for your help! After turning on debug logging, I randomly sampled 100 entries and saw that about 90% of queries are indeed being served from cache. However, my local bind9 still receives a significant number of upstream requests—about 20% of the total. I believe this is due to a misconfiguration on my part, so I’ll adjust my settings further and continue to observe. Thanks again for your helpful advice!
query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms这里
时间是 0ms,很有可能是从 cache 返回的。你最好打开调试,重复查询同一个域名,看看缓存是否有效。此外,请确保缓存大小足够大。
您可以增加 TTL-min。
没重构之前的代码国内国外的DNS会分开查询,重构后的代码!存在混用的情况!就是访问国内网站国外分组也会被查询!从smartdns UI可以看到国外分组是正常的 !之前只有开了代理才会状态正常!现在不开也会显示正常!会显示成功查询次数!没重构之前的状态都是未知的,也不会有查询成功的次数!
query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms这里
时间是 0ms,很有可能是从 cache 返回的。你最好打开调试,重复查询同一个域名,看看缓存是否有效。 此外,请确保缓存大小足够大。 您可以增加 TTL-min。没重构之前的代码国内国外的DNS会分开查询,重构后的代码!存在混用的情况!就是访问国内网站国外分组也会被查询!从smartdns UI可以看到国外分组是正常的 !之前只有开了代理才会状态正常!现在不开也会显示正常!会显示成功查询次数!没重构之前的状态都是未知的,也不会有查询成功的次数!
代码都是有基本的自动化测试保证的,这种基本功能不会有问题, 重点还是检查你自己的配置是否有误。
如果配置确认没有问题,确认是软件问题,请提供相关的复现配置和debug log,提交issue。
query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms这里
时间是 0ms,很有可能是从 cache 返回的。你最好打开调试,重复查询同一个域名,看看缓存是否有效。 此外,请确保缓存大小足够大。 您可以增加 TTL-min。没重构之前的代码国内国外的DNS会分开查询,重构后的代码!存在混用的情况!就是访问国内网站国外分组也会被查询!从smartdns UI可以看到国外分组是正常的 !之前只有开了代理才会状态正常!现在不开也会显示正常!会显示成功查询次数!没重构之前的状态都是未知的,也不会有查询成功的次数!
代码都是有基本的自动化测试保证的,这种基本功能不会有问题, 重点还是检查你自己的配置是否有误。
如果配置确认没有问题,确认是软件问题,请提供相关的复现配置和debug log,提交issue。
正常的应该是这样的!
这是配置图
下面这个是不正常的