1.1.1.1 icon indicating copy to clipboard operation
1.1.1.1 copied to clipboard

resolving problems with systemd-resolved and dnssec enabled

Open wzrdtales opened this issue 3 years ago • 9 comments
trafficstars

I am not sure if cloudflare was aware yet, but systemd-resolved has major issues using 1.1.1.1 to resolve dns queries which are DNSSEC enabled. This seems to be very specific to 1.1.1.1 since systemd-resolved falls back to 8.8.8.8 which immediately gets processed.

Since there seems to be no other bug tracker for 1.1.1.1 I just drop this here in the hope that this might be fixed, until then I will have to switch all customers and all our devices back to 8.8.8.8, which I really tried to avoid :/

wzrdtales avatar Dec 20 '21 14:12 wzrdtales

Thanks for the report - let me check internally. We have a lot of Linux users on systemd-based systems, and I so suspect this is more specific to a version or config vs. systemd-resolved as a whole.

elithrar avatar Dec 20 '21 14:12 elithrar

I also assume you are using the WARP desktop client, or do you mean "1.1.1.1 over plaintext DNS" - the latter which would be more surprising.

8.8.8.8 is also a DNSSEC validating resolver.

@wzrdtales - can you provide sample output that describes the "problems" you are encountering? A dig with 8.8.8.8 and 1.1.1.1 for affected domains will help here.

elithrar avatar Dec 20 '21 14:12 elithrar

Also, what linux distro are you using?

Noah-Kennedy avatar Dec 20 '21 14:12 Noah-Kennedy

ahhhh well no, the desktop client has nothing to do with that and the assumption is wrong. again: systemd-resolved is the component we talk about and this is true for servers without any desktop as well. It is not about 8.8.8.8 being a dnssec validating resolver, it is about the end user component (which is the only important thing that the end user client validates not a dns that can be faked (see russian dns redirection)). And here for some reason there are well known problems with 1.1.1.1

Details about me though:

Were it doesn't work?

  • Everywhere where systemd-resolved and dnssec validation is activated for it (Arch Linux, Ubuntu 20.04, 21.04, SLES, just to name a few of the distros in use)
  • specific to DE? no

wzrdtales avatar Dec 20 '21 14:12 wzrdtales

see commentary in older threads (https://bbs.archlinux.org/viewtopic.php?id=240427), as said the problem with cloudflare dns is known for quite some time, but seems no one ever reported it to cloudflare itself.

Here another thread, which tried to go more into debugging it:

https://bbs.archlinux.org/viewtopic.php?id=262926

Indeed I can also confirm: systemd-resolved tries until the timeout on cloudflare and then switches over to google to finally really resolve. What exactly the issue is: no idea yet, there were some assumptions in the last mentioned thread, but I doubt they were validated.

wzrdtales avatar Dec 20 '21 15:12 wzrdtales

@wzrdtales - can you please provide logs of failing queries via systemd, so we can better understand the failure mode here, and reproduce 1:1 - ?

It is likely that other resolvers are falling back to insecure mode as a "workaround" (favoring availability over security).

Specifically:

  1. Does this impact every DNSSEC-enabled domain, or just some?
  2. If it is just some, can you provide logs/failing query output - via dig - to show the failure mode here?
  3. We suspect it may be similar to https://community.cloudflare.com/t/sites-that-are-aliased-to-ec-azureedge-net-cannot-be-resolved-if-dnssec-is-on/94282 - which is related to a bug in how systemd-resolved handles DNSSEC validation.

elithrar avatar Dec 20 '21 15:12 elithrar

Does this impact every DNSSEC-enabled domain, or just some?

as far as I could test every including i.e. google.com, but also domains that use cloudflare dns, i.e. our own homepage wizardtales.com is affected as well. So pretty much everything (my guess)

If it is just some, can you provide logs/failing query output - via dig - to show the failure mode here?

there is no failure mode shown there b/c systemd-resolved falls back to googles dns servers after failing on cloudflare. so it will resolve but it takes like 10 seconds until it does (b/c of the timeout)

Here is a log (will be deleted within 24 hours so please download it :) ), which resolves a domain that is on cloudflare dns itself (so no foreign errors to be expected) and tries until it finally fails over to google who does the job immediately.

https://0bin.net/paste/8S0F5r9T#o9qZNC-2MCjqAehhdrjYL8B58Bfx9moFWfEwWsUxPFQ

wzrdtales avatar Dec 20 '21 19:12 wzrdtales

btw. don't get confused, it says switching to 8.8.8.8 at the very beginning, this is b/c of the global override, however the interface setting has priority which is 1.1.1.1 which is what it starts the resolv trial. if you want i can send you a log of a only 8.8.8.8 resolve that is just working.

wzrdtales avatar Dec 20 '21 19:12 wzrdtales

hitting up again, anything I can help with? nothing changed until now

wzrdtales avatar Apr 03 '22 16:04 wzrdtales