unbound Consider disabling CNAME scrubbing for forwarded queries

Unbound protection against Kaminsky cache poisoning attack is creating some unexpected issues when used in the context of forwarding, especially if used with a filtering upstream. Some users of our service (NextDNS), discovered this issue since edgekey.net has been added to some anti-tracker blocklists, resulting in the blocking of large sites like apple.com, airbnb.com, ebay.com when used with unbound.

The issue is that most anti-tracking blocklist or RBLs are to be used on qname only. But in a response like www.apple.com, we get CNAMEs to different authorities:

www.apple.com.		531	IN	CNAME	www.apple.com.edgekey.net.
www.apple.com.edgekey.net. 10340 IN	CNAME	www.apple.com.edgekey.net.globalredir.akadns.net.
www.apple.com.edgekey.net.globalredir.akadns.net. 1857 IN CNAME	e6858.dsce9.akamaiedge.net.
e6858.dsce9.akamaiedge.net. 0	IN	A	184.27.213.112

The scrub_sanitize() function will remove from this response any CNAMEs that is not under the same authority before caching, leaving only the first CNAME. The iterator will then complete that query by asking the forwarder for www.apple.com.edgekey.net. and so on. At this point, the forwarder sees www.apple.com.edgekey.net. as a qname and not a CNAME anymore, and thus apply blocklists that will block anything under edgekey.net..

The other issue with this approach is around performance. When a forwarder is setup, the user expects unbound to act as a stub resolver, delegating to the forwarded resolver all the recursive work. It is not expected from unbound to generate multiple queries for a single one, which can add very substantial latency. In the example of apple, no less than 4 sequential queries are required to fulfill the request!

I guess this is done because each intermediate CNAME is stored in the cache individually, as a good recursive resolver should do. In the case of forwarding, I believe individual CNAME TTLs should not be honored, and the whole response should be cached as a whole with a TTL of the MIN of all the TTLs.

Dec 15 '19 04:12 rs

Hi @rs Olivier Poitrey, I can see we are doing the same thing here (www.mypdns.org)

What I stumbled on here is: How do you users block edgekey.net. and still get the content's they are looking for, as, in your example, you could in principle skip the edgekey.net. and serve the request like:

www.apple.com.		531	IN	CNAME	184.27.213.112

And then have the front-end be the IP the cyber criminals at edgekey.net. sees?

I obfuscate the requestor IP in the front-end with Dnsdist by PowerDNS, this way the bad trackers only sees my recursors ip, never the end users.

I know this is a bit OT, but you got me curious :smiley:

Dec 15 '19 08:12 spirillen

Hi Olivier,

Having Unbound chase the CNAMEs is actually intended behaviour. As you already mentioned this is one of the many mitigations against cache poisoning. We do not differentiate between authoritative servers discovered by following delegations and configured forwarders. We want to harden against potentially poisoned caches we are forwarding to. Although Unbound can forward queries to other resolvers, it still keeps its role as a recursive resolver.

Maybe we can think about adding a config option to Unbound one day to specify that you are running in real stub mode and sending queries to a resolver you trust. Alternatively, you could have a look at getdns/stubby, which is software specifically created to run as a stub.

Dec 19 '19 10:12 ralphdolmans

I understand the intent, I'm just saying it creates performance and behavior issues. You have to consider a popular deployment of unbound with a forwarder set on .. In such setup, you are scrubbing CNAMEs coming from the forwarded resolver, only to ask again the very same resolver for the intermediate CNAMEs. This is a serious performance degradation for absolutely no benefit. Not to mention the issue with filtering resolvers like ours which will have different treatments for QNAMEs vs CNAMEs.

In the example of www.apple.com above, it takes 4 RTTs instead of 1 to resolve this hostname. And in the case of a filtering resolver, you end up with www.apple.com being wrongly blocked because the resolver has edgekey.net. in its blocklist, intended to be blocked only on QNAMEs.

I understand that not scrubbing CNAMEs might cause issues in split horizon configurations. If the forwarder is only set on a zone, you might not want the resolver to be able to poison the cache of zone outside of its "realm". The same is true for a forwarder on . with a local zone, in theory, the forwarder should not be able to poison the cache for CNAMEs handled by local zones.

If you would stop scrubbing CNAMEs and would cache forwarder responses as a whole instead of decomposing CNAMEs, it would solve both issues. You would lose the granular TTL handling, but I don't think it is the role of a stub to maintain a cache for intermediate CNAMEs, and the current behavior is negating all potential improvement of maintaining individual CNAME cache.

Dec 19 '19 17:12 rs

@ralphdolmans would you be open to adding an option to disable this behavior? Sadly, we currently have to discourage our users from using unbound as a forwarder even though we are using it as a recursive ourselves…

Apr 19 '20 02:04 rs

Sorry for missing your earlier reply.

Stripping the extra records of the CNAME chain does add extra security, even if the second query is forwarded to the same resolver. It hardens against the Kaminsky attack. If the Unbound that is forwarding queries can be triggered to send a lot of queries with a random subdomain to bypass the cache, there is a high chance for an off-path attacker to have an answer accepted. If these queries are for a name close to the name to attack, and the answer contains a CNAME with as target the name to attack with that name as extra record in the answer section, the cache will become poisoned with that record.

Even if Unbound stores the complete answer and will only use the poisoned CNAME target record as part of that chain (so with the random domain as the start of the chain), it can still end up being harmful to clients querying the Unbound instance, that might cache the harmful record. We try to prevent having poisoned records in the cache at all.

Sending the query over a stateful transport mitigates such off-path attack. Maybe it is an idea to add an option that treats the CNAME chain differently when the forwarded query is sent over TLS?

The split-horizon issue could be solved by checking if the owner of the records of the chain is not under the responsibility of another forwarder. On the other hand.. if this configuration option is disabled by default it is maybe fine to change this behavior and just allow the forwarder to return a record in a CNAME chain that could have been answered by a more specific forward-zone, as long as the owner name of all records in the CNAME chain is at or below the name of the forward. This resolver has the power to give an answer themselves already anyway, as they could just return an answer without CNAME.

This will help to limit the number of queries. I am not completely sure it always helps you with your resolver giving different answers for the same record, based on how it is queried. It might be good to state that Unbound stores all answers in the same cache (ignoring ECS and shared caches), thereby assuming that this is the version of the record that can always be used. My worry is that already cached records can be used in the generation of the answer, resulting in unexpected behavior.

Apr 20 '20 12:04 ralphdolmans

+1 for the option to disable CNAME scrubbing on TCP/TLS.

Regarding your last point, you're right, querying a blocked intermediary CNAME (as qname) before a domain pointing to it could pollute the cache with the blocked response. That's why I suggested that the forwarder code should stored responses as-is in the cache, and not exploded it into individual records. I see no real benefit of exploding those records for a non recursive resolver.

Apr 20 '20 23:04 rs

Another +1 for having this option.

May 28 '20 06:05 timc3

+1

Jun 09 '20 05:06 Hackintosh-Unix

+1

Also was anyone able to get the NextDNS configuration to be recognized by using the correct endpoint IP for IPv6 DoT? I tried using the special configuration IPv6 endpoint for DNS-over-TLS and it doesn't recognize it when using unbound.

Jun 22 '20 17:06 onyx4

+1

Jul 04 '20 22:07 coldfire7

+1

Jul 14 '20 03:07 yummy-ja

+1

Also was anyone able to get the NextDNS configuration to be recognized by using the correct endpoint IP for IPv6 DoT? I tried using the special configuration IPv6 endpoint for DNS-over-TLS and it doesn't recognize it when using unbound.

try: forward-addr: 45.90.28.0:853#e4d75b.dns1.nextdns.io forward-addr: 45.90.30.0:853#e4d75b.dns2.nextdns.io

Jul 15 '20 01:07 221rsg

I have the same issue, I cannot NextDNS to recognize the endpoint even if using the configuration name in the IPv6 address.

How do we open a case with NextDNS? I don’t see anywhere on the site to do so.

Thanks, Christian

On Jul 14, 2020, at 6:50 PM, 221rsg [email protected] wrote:

+1

Also was anyone able to get the NextDNS configuration to be recognized by using the correct endpoint IP for IPv6 DoT? I tried using the special configuration IPv6 endpoint for DNS-over-TLS and it doesn't recognize it when using unbound.

try: forward-addr: 45.90.28.0:853#e4d75b.dns1.nextdns.io forward-addr: 45.90.30.0:853#e4d75b.dns2.nextdns.io

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Jul 16 '20 15:07 onyx4

+1

Sep 11 '20 07:09 ajf

+1

Oct 19 '20 08:10 SquirrelCoder

+1

Nov 12 '20 22:11 zfogg

I believe this is the same as the issue I have using Unbound with Cloudflare's 1.1.1.2 and 1.1.1.3 resolvers for content blocking.

As an example, https://docs.sentry.io runs on Vercel, so it is a CNAME that points to cname.vercel-dns.com.

Direct queries to Cloudflare for docs.sentry.io return a DNS response with the CNAME for docs.sentry.io and the A record for cname.vercel-dns.com all together, but a direct query to Cloudflare for cname.vercel-dns.com returns 0.0.0.0 (which is how Cloudflare blocks a domain).

Because Unbound ignores the second part of the DNS result (with the A record), it has to do a second query to Cloudflare for cname.vercel-dns.com, which is blocked.

I would really like to see the suggestion in https://github.com/NLnetLabs/unbound/issues/132#issuecomment-616870825 that resolvers queried over TLS can bypass this check, since that would help in my situation too.

Mar 07 '21 03:03 kohenkatz

+1

Aug 21 '21 23:08 podivilov

Adding my two cents to this (seemingly dead) issue: with the increasing prevalence of work-from-home (and in more traditional corporate environments), having the ability to alias across domains on a private network is increasingly important. The way Unbound currently handles CNAMEs that reference other (internal) domains makes it entirely unsuitable for mixed environments with overlapping or shared resources, and there really ought to be a way to disable the built-in scrubbing and assume the accompanying risk. Ideally this would be configurable just for local zones or even specific domains, but at this point I think we'd be happy to turn it off entirely.

Oct 07 '22 04:10 redge

+1 the same question

Apr 22 '23 01:04 395096713

I have to deal with bogus ISP which intercepts all dns queries. Its dns server may answer correctly to normal query but injects invalid answer for CNAME, see below queries for example. As a result unbound tries to follow CNAME and gets invalid (injected) result as well.

❯ drill dl-cdn.alpinelinux.org
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 51455
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
;; dl-cdn.alpinelinux.org.      IN      A

;; ANSWER SECTION:
DL-Cdn.aLpINElinuX.org. 2493    IN      CNAME   dualstack.j.sni.global.fastly.net.
dualstack.j.sni.global.fastly.net.      15      IN      A       151.101.242.132

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 201 msec
;; SERVER: 192.168.0.1
;; WHEN: Fri Aug 11 02:18:17 2023
;; MSG SIZE  rcvd: 125
❯ drill dualstack.j.sni.global.fastly.net
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 63542
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; dualstack.j.sni.global.fastly.net.   IN      A

;; ANSWER SECTION:
dualstack.j.sni.global.fastly.net.      300     IN      A       127.0.0.1

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 36 msec
;; SERVER: 192.168.0.1
;; WHEN: Fri Aug 11 02:18:32 2023
;; MSG SIZE  rcvd: 67

Aug 10 '23 21:08 vbauerster

Not sure if it needs to be said but this is still an issue in latest. Hopefully it is fixed soon. Thanks!

Feb 13 '24 09:02 simeononsecurity

Forgive my ignorance but does this affect regular home opnsense users?

May 18 '24 20:05 thericeking

This should not be a problem for home users. The issue deals with performance, and security. Performance of dns lookups is usually not an issue for home users, because there are only a couple queries. The security aspects are far more important, and thus this should not get changed to flaw security. The security aspects are problems, also for home users, there was a list of vulnerabilities for popular home modems a while ago. The performance part is that a query takes several packets to resolve, which is not really a problem; recursors spend lots of time with packets to various places to figure stuff out, also for stuff like delegations, and dnssec keys. So in the sense of trade off, security is much more important here. If it would be safer, it would be a good idea to send even more queries and spend much more effort. And this is also possible with options to harden lookups with caps bits and referral path, and various crypto options for spending more work, on encrypted transports and DNSSEC. DNSSEC is supposed to protect against security issues for it, and enabling it should protect the server.

May 21 '24 07:05 wcawijngaards

This should not be a problem for home users.

It is absolutely a problem for home users.

My comment about Cloudflare "Families" DNS is one example of home users being affected. (Although this specific host is no longer an issue, since Cloudflare now resolves cname.vercel-dns.com again, the root of the issue still can cause trouble with other sites.)

@redge's comment about work-from-home split-DNS is another.

For that matter, the very first comment describes a behavior issue that results in sites being blocked due to resolvers that perform filtering.

The issue deals with performance, and security.

This issue also deals with behavior and usability. Users who want their upstream DNS provider to block malware, advertising, security threats, and/or certain types of content are fighting against this behavior of unbound.

This suggestion by @ralphdolmans seems to be a good compromise, with support from a number of this issue's participants:

Maybe it is an idea to add an option that treats the CNAME chain differently when the forwarded query is sent over TLS?

I wish my C skills were good enough to provide a PR for this, but they aren't, and it doesn't seem like there is anyone else interested in working on this. (Unless it has been implemented already and I've missed seeing that.)

May 21 '24 15:05 kohenkatz

A quick answer is that all the issues cited are 'weird developments' by the upstream. But instead security is more important.

May 22 '24 07:05 wcawijngaards

The issue is the security documented here, but also elsewhere: https://datatracker.ietf.org/doc/rfc5452/

What I am trying to do is keep new entries from happening to the list of security advisories, https://nlnetlabs.nl/projects/unbound/security-advisories/

I do not like your argument that there is no valid reason. So I'll close this with 'NextDNS is broken, do not use it', as the summary. (NextDNS should respond to the cname target queries from cname scrubbing, correctly, so that resolution works, it does not seem to be a performance issue, but getting the correct answer seems to be the issue, otherwise there would be no visible problem).

Jul 16 '24 07:07 wcawijngaards

unbound unbound copied to clipboard

Consider disabling CNAME scrubbing for forwarded queries

unbound
unbound copied to clipboard