smartdns icon indicating copy to clipboard operation
smartdns copied to clipboard

smartdns seems NOT cache the query

Open ZhiShengYuan opened this issue 7 months ago • 6 comments

Problem Description
SmartDNS does not appear to fully cache responses. When serving as the primary DNS for a large organization, we observe SmartDNS issuing nearly 1 k QPS upstream—roughly matching its total incoming QPS—despite caching being enabled.

Environment

  1. Firmware / Package: Debian package smartdns 40+dfsg-1
  2. Carrier / Network: Internal enterprise network
  3. OS: Debian GNU/Linux 12 (bookworm) x86_64
  4. SmartDNS Source & Version: Installed from Debian official repositories, version 40+dfsg-1
  5. Relevant Configuration (sanitized):
    rr-ttl-min 30
    rr-ttl-max 360
    rr-ttl-reply-max 30
    cache-size 524288
    speed-check-mode none
    max-query-limit 65535
    prefetch-domain no
    serve-expired yes
    serve-expired-ttl 3600
    serve-expired-reply-ttl 10
    serve-expired-prefetch-time 30
    dualstack-ip-selection no
    cache-persist no
    audit-enable yes
    audit-size 8G
    audit-num 8
    log-level error
    log-size 8G
    log-num 8
    domain-set -name blocked -file /etc/smartdns/domainlist.d/blacklist.conf
    group-match domain-set:blocked
    address /domain-set:blocked/#
    

Reproduction Steps

  1. Deploy SmartDNS with the above configuration as the organization’s primary DNS resolver.
  2. Generate client load of approximately 1 k QPS.
  3. Monitor upstream DNS traffic; for example:
    tail -f /var/log/named/queries-unicom.log | pv -l -i 1 > /dev/null
    
    You will see sustained very high RPS upstream despite cache settings.

Audit Log Sample

[2025-04-17 16:37:23,993] 172.22.124.15 query as.xiaohongshu.com, type 1, time 0ms, speed: -0.1ms, result 101.34.191.138, 101.34.194.192
[2025-04-17 16:37:24,020] 10.0.142.2   query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms, speed: -0.1ms, result 43.137.231.211, 43.137.231.90, 43.137.231.60, 43.137.231.93, 43.137.231.217

Expected Behavior:
SmartDNS should serve cached responses for repeated queries and minimize upstream QPS.

Actual Behavior:
SmartDNS continues to forward nearly all queries upstream, as if caching is ineffective.

Thank you for your assistance.
Best regards,
Yuan

ZhiShengYuan avatar Apr 17 '25 08:04 ZhiShengYuan

Additional Information We have also tested SmartDNS 1.2025.03.02-1533 (Release46-23-gb525170) and can reproduce the same behavior.

ZhiShengYuan avatar Apr 17 '25 09:04 ZhiShengYuan

query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms

Here the time is 0ms, and it is very likely that the result is returned from cache. You'd better turn on debugging and repeatedly query the same domain name to see if the cache is effective.

In addition, ensure that the cache size is big enough.

and you can increase ttl-min.

pymumu avatar Apr 17 '25 10:04 pymumu

query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms

Here the time is 0ms, and it is very likely that the result is returned from cache. You'd better turn on debugging and repeatedly query the same domain name to see if the cache is effective.

In addition, ensure that the cache size is big enough.

and you can increase ttl-min.

Thank you so much for your help! After turning on debug logging, I randomly sampled 100 entries and saw that about 90% of queries are indeed being served from cache. However, my local bind9 still receives a significant number of upstream requests—about 20% of the total. I believe this is due to a misconfiguration on my part, so I’ll adjust my settings further and continue to observe. Thanks again for your helpful advice!

ZhiShengYuan avatar Apr 17 '25 13:04 ZhiShengYuan

query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms

这里时间是 0ms,很有可能是从 cache 返回的。你最好打开调试,重复查询同一个域名,看看缓存是否有效。

此外,请确保缓存大小足够大。

您可以增加 TTL-min。

没重构之前的代码国内国外的DNS会分开查询,重构后的代码!存在混用的情况!就是访问国内网站国外分组也会被查询!从smartdns UI可以看到国外分组是正常的 !之前只有开了代理才会状态正常!现在不开也会显示正常!会显示成功查询次数!没重构之前的状态都是未知的,也不会有查询成功的次数!

xiaobaishu1 avatar Apr 18 '25 02:04 xiaobaishu1

query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms

这里时间是 0ms,很有可能是从 cache 返回的。你最好打开调试,重复查询同一个域名,看看缓存是否有效。 此外,请确保缓存大小足够大。 您可以增加 TTL-min。

没重构之前的代码国内国外的DNS会分开查询,重构后的代码!存在混用的情况!就是访问国内网站国外分组也会被查询!从smartdns UI可以看到国外分组是正常的 !之前只有开了代理才会状态正常!现在不开也会显示正常!会显示成功查询次数!没重构之前的状态都是未知的,也不会有查询成功的次数!

代码都是有基本的自动化测试保证的,这种基本功能不会有问题, 重点还是检查你自己的配置是否有误。

如果配置确认没有问题,确认是软件问题,请提供相关的复现配置和debug log,提交issue。

pymumu avatar Apr 18 '25 02:04 pymumu

query pull-flv-l11-cny.douyincdn.com, type 1, time 0ms

这里时间是 0ms,很有可能是从 cache 返回的。你最好打开调试,重复查询同一个域名,看看缓存是否有效。 此外,请确保缓存大小足够大。 您可以增加 TTL-min。

没重构之前的代码国内国外的DNS会分开查询,重构后的代码!存在混用的情况!就是访问国内网站国外分组也会被查询!从smartdns UI可以看到国外分组是正常的 !之前只有开了代理才会状态正常!现在不开也会显示正常!会显示成功查询次数!没重构之前的状态都是未知的,也不会有查询成功的次数!

代码都是有基本的自动化测试保证的,这种基本功能不会有问题, 重点还是检查你自己的配置是否有误。

如果配置确认没有问题,确认是软件问题,请提供相关的复现配置和debug log,提交issue。

正常的应该是这样的!Image

这是配置图Image

下面这个是不正常的Image

xiaobaishu1 avatar Apr 18 '25 05:04 xiaobaishu1