AdGuardHome icon indicating copy to clipboard operation
AdGuardHome copied to clipboard

Special config needed for wildcard certs, when using clientid over DoT or DoQ

Open RainmakerRaw opened this issue 2 years ago • 6 comments

Issue Details

  • Version of AdGuard Home server: v0.107.7
  • How did you install AdGuard Home: GitHub release on Oracle Cloud VPS
  • How did you setup DNS configuration: On the AGH instance or clients? AGH is forwarding to Quad9 DoT.
  • If it's a router or IoT, please write device model: N/A
  • CPU architecture: ARM64
  • Operating system and version: Ubuntu 20.04.4 LTS aarch64

Expected Behavior

The AGH server runs on an Oracle Cloud VPS. Its domain name is dns.domain.xyz and it is (was) using a wildcard cert from LetsEncrypt, and later ZeroSSL (via acme.sh). Therefore, encrypted client URLs using a clientid are:

https://dns.domain.xyz/dns-query/my-mbp tls://my-mbp.dns.domain.xyz quic://my-mbp.dns.domain.xyz

AGH has clientid set up appropriately, under 'Client settings'. For example 'Client name' is 'my-mbp' and 'Identifier' is also 'my-mbp'.

According to the Wiki and what I read on various tickets here in the Issues, with a regular wildcard cert clients should be able to connect to any or all of those endpoints and experience encrypted DNS. Indeed, the cert for the current beta of AdGuard Personal DNS (adguard-dns.io) shows a 'regular' wildcard cert, with the CN showing *.adguard-dns.io and it being valid for the domains of *.adguard-dns.io and adguard-dns.io.

For me though, generating a cert like that simply won't work with client IDs over TLS, as follows:

Actual Behavior

As per some other tickets on this matter (which are closed, and don't cover my exact issue at all - hence making a new one), DoH with a clientid works fine using the https endpoint. DNS over TLS does work if you just use dot://dns.domain.xyz, but doesn't work at all if you use a clientid as well (like tls://my-mbp.dns.domain.xyz).

Various resolvers (Unbound, Knot Resolver, Stubby) being pointed to clientid.dns.domain.xyz all complain about the certificate not matching the domain. Setting the AGH log to verbose also confirms the issue, and fills up on every attempt to resolve a domain, with:

2022/07/06 21:52:36.726059 1347#38 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).handleTCPConnection(): handling tcp: started handling tls request from 1.2.3.4:53232
2022/07/06 21:52:36.760977 1347#38 [error] handling tcp: reading msg: reading len: remote error: tls: bad certificate

This is despite a valid wildcard cert being in place (tested using both LetsEncrypt and ZeroSSL certs valid for *.domain.xyz and domain.xyz). I tried to debug this last year and gave up, but this time it was annoying me so I got it fixed.

Initially, I suspected the Oracle Cloud VPS or its installed OS of having some weird config issue or corruption. So I span up a FreeBSD 13 instance on Vultr and clean installed AGH (again from the Github release) there and replicated my config. That had the same issue, so I knew it was actually my cert or AGH itself.

As it turns out, I can make clientid over TLS work if, instead of generating a 'standard' wildcard cert I generate using one extra parameter for -d *.dns.domain.xyz like this:

acme.sh --issue -d domain.xyz -d '*.domain.xyz' -d '*.dns.domain.xyz' --dns dns_cf --keylength ec-384  

Now, the errors stop, the resolvers are all happy (tested on systemd-resolved, unbound, kresd (knot-resolver) and stubby) and DNS over TLS with a clientid works perfectly as it should. My question/issue, is that all the (sparse) documentation I've seen on this matter suggests a regular wildcard cert with two domains/common names (domain and *.domain) will work, but it doesn't. Effectively, wildcards only go one subdomain deep, where dns.domain.xyz would need two subdomains deep for clientid.dns.domain.xyz.

Does anyone know why?

Interestingly, if it's any use in debugging the issue, even on previous regular wildcard cert (which caused straight SERVFAIL in all other resolvers), I could get systemd-resolved to work over TLS, by setting DNSOverTLS=opportunistic. I'm assuming that the parameter doesn't just mean 'use TLS if it's there, and use regular DNS if it isn't', but rather also '...and if a cert is there, don't look too closely at the TLS certificate and domains - just use TLS anyway'. It's the only resolver that worked with clientid over TLS even with just a standard wildcard cert. Every other resolver throws fails, and the AGH log likewise filled with errors as above.

If anyone has any input or ideas, I'd be interested to hear (and learn) more. If this is all standard, and one should be getting certs for the three domains (namely domain.xyz, *.domain.xyz and *.dns.domain.xyz, then can we update the wiki to be more clear on the matter? I lost literally days and nights of time to 'fixing' this! :)

Additional Information

Full debug log demonstrating the issue attached,from the FreeBSD test instance of AGH. verbose.log

RainmakerRaw avatar Jul 07 '22 18:07 RainmakerRaw

Can confirm this is an issue - in fact, I came here hoping I'd find someone who has experienced this. I am using DoT only so it's pretty critical I use the clientID based on subdomain lookup. I have the same configuration - familymember.subdomain.domain.com. I created certificates the manner using similar arguments with certbot: -d sub.domain.com -d domain.com -d *.sub.domain.com

I'm not entirely sure something is broken here, in fact I think things are working as designed for TLS. If anything, probably just need documentation updating. I may look for a place to do it and submit a PR.

Thank you very much for your defect here, though. Very, very helpful to me and once I realized the problem I was up in running in 10 minutes.

ryancastro avatar Aug 10 '22 06:08 ryancastro

Thank you for the thorough report and apologies for the long silence.

Effectively, wildcards only go one subdomain deep, where dns.domain.xyz would need two subdomains deep for clientid.dns.domain.xyz.

Does anyone know why?

This is just how wildcard certificates work. For example, as explained in “Understanding Wildcard SSL & How Does a Wildcard Certificate Work?”:

Bob owns a website that he feels is possibly overloaded with information in an unsystematic manner. He is afraid that new customers might find it hard to navigate, so he wants to segregate and organize it structurally using the following subdomains:

  • login.domain.com
  • products.domain.com
  • blog.domain.com

These first-level subdomains can all be secured under the same wildcard – *.domain.com.

Now, let’s take a look at some of his second-level subdomains:

  • member.login.domain.com
  • dev.login.domain.com
  • mail.login.domain.com

These can be secured using *.login.domain.com. However, since Bob can only secure one level of subdomains per wildcard, this means that he would need to use an additional wildcard certificate to encrypt these second-level subdomains.

If anyone has any input or ideas, I'd be interested to hear (and learn) more. If this is all standard, and one should be getting certs for the three domains (namely domain.xyz, *.domain.xyz and *.dns.domain.xyz, then can we update the wiki to be more clear on the matter?

The assumption we generally have when writing documentation is that people generating certificates for their AGH installations are familiar with the fact that they're single-level, but seeing how this is not the first time we encounter such misunderstandings, I feel like we do need to state that explicitly in the Wiki and other docs. We'll try to do that once we have the time.

ainar-g avatar Aug 30 '22 11:08 ainar-g

Thank you, @ainar-g, for taking the time to reply and for being so helpful. Yes, it was a misunderstanding on my part. The documentation only refers to wildcard certificates. I took 'wildcard' literally - if * means 'all' or 'everything', then naturally (according to my logic) *.domain.com covers all subdomains. Alas, as you say it only works per level. All working perfectly now, and hopefully the documentation updates save someone else the frustration!

RainmakerRaw avatar Sep 01 '22 00:09 RainmakerRaw

@RainmakerRaw @ainar-g I have AGH running in secure.example.com. If I want to have client IDs even when using DoT, I need have SSL certificate for not only secure.example.com and *.secure.example.com but also for *.example.com?

However I have other services running on example.com subdomains with SSL (on different VMs), what happens to those certs if I generate wildcard cert for *.example.com?

hexclann avatar Oct 27 '22 12:10 hexclann

@hexclann You should be fine. I generate certs on multiple instances for domain.com, *.domain.com and *.dns.domain.com without issue (certainly with ZeroSSL and LetsEncrypt). That said, as long as your AGH instance only actually serves traffic for *.secure.example.com then you can get away with only issuing the cert for only the two subdomains (*.secure.example.com and secure.example.com).

RainmakerRaw avatar Oct 28 '22 13:10 RainmakerRaw

@RainmakerRaw But setting up myphone.secre.example.com as a private DNS in android didn't work. Is there any additional config needed? I have opened an issue #5076 with more information

hexclann avatar Oct 29 '22 17:10 hexclann