AdGuardHome icon indicating copy to clipboard operation
AdGuardHome copied to clipboard

Unstable work as DoH behind Nginx reverse server with Keenetic

Open savely-krasovsky opened this issue 3 years ago • 19 comments

Have a question or an idea? Please search it on our forum to make sure it was not yet asked. If you cannot find what you had in mind, please submit it here.

Prerequisites

Please answer the following questions for yourself before submitting an issue. YOU MAY DELETE THE PREREQUISITES SECTION.

  • [x] I am running the latest version
  • [x] I checked the documentation and found no answer
  • [x] I checked to make sure that this issue has not already been filed

Issue Details

I am running AdGuard Home at the personal VPS behind Nginx reverse proxy and use it at my Keenetic router. I find it unstable while using DoH. I am getting query timeouts, keenetic's DNS DoH proxy shuts down randomly, some hosts cannot be resolved at all (even while protection is off, ofc), etc. I understand that this could be related to Keenetic itself, but Keenetic works great with Cloudflare and Google public DoH with 0 problem.

  • Version of AdGuard Home server:
    • v0.106.3
  • How did you install AdGuard Home:
    • GitHub releases
  • How did you setup DNS configuration:
    • Router (DoH)
  • If it's a router or IoT, please write device model:
    • VPS with 2 cores and 4GB of RAM
  • CPU architecture:
    • x86
  • Operating system and version:
    • Debian 10

Expected Behavior

Works as good as any other public DNS-server.

Actual Behavior

Unstable behaviour. Timeouts and some problems which lead to DoH server proxy crashes at Keenetic side.

Screenshots

Linux machine inside Keenetic LAN which uses it's DNS. It always stucks like this. With Cloudflare/Google DoH it works otherwise like a charm:

Keenetics https-dns-module restarts every time with some error:

Additional Information

Nginx configuration:

server {
        listen 80;
        listen [::]:80;
        server_name adguard.example.com;

        return 301 https://adguard.example.com$request_uri;
}

server {
        listen 443 ssl;
        listen [::]:443 ssl;
        server_name adguard.example.com;

        access_log      /var/log/nginx/adguard.access.log;
        error_log       /var/log/nginx/adguard.error.log;

        ssl_trusted_certificate /etc/letsencrypt/live/example.com/chain.pem;
        ssl_certificate         /etc/letsencrypt/live/example.com/fullchain.pem;
        ssl_certificate_key     /etc/letsencrypt/live/example.com/privkey.pem;

        gzip off;

        location / {
                add_header X-Robots-Tag 'noindex';

                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;

                proxy_pass http://127.0.0.1:3000;
        }

        location /robots.txt {
                return 200 "User-agent: *\nDisallow: /\n";
        }
}

I can also provide access to the server itself to test it by own.

savely-krasovsky avatar Jun 09 '21 14:06 savely-krasovsky

@L11R & @ainar-g, I too have been in contact with Keenetic support about this issue: https://yadi.sk/i/R_YimY5nPRWakg

But, as I realized, this applies to any servers in any configuration and is not a error. Anyway, I'm not sure about that🤔

ammnt avatar Jun 10 '21 07:06 ammnt

Hello and thank you for your report. Could you please add the following information:

  • What upstreams do you use? Does the issue persist if you use other upstreams?
  • Can you please [configure] AGH to collect verbose logs and send them to us at [email protected] with the subject line “AdGuard Home issue 3250”?

Thanks!

ainar-g avatar Jun 10 '21 14:06 ainar-g

@ainar-g I use this ones:

  upstream_dns:
  - 1.1.1.1
  - 1.0.0.1
  - 8.8.8.8
  - 8.8.4.4

I've enabled verbose logs, but how much of them you need? The problem appears only after few days of running AGH (sort of leak? buffers overloading? idk)

savely-krasovsky avatar Jun 10 '21 15:06 savely-krasovsky

Thanks for the info! It would be the best to get the logs from the day when the problems start. If you can pinpoint the exact hour, it would be nice to have logs for one hour before and after that. Thanks!

ainar-g avatar Jun 10 '21 15:06 ainar-g

@ainar-g I've captured some logs. How can I send you them privately?

savely-krasovsky avatar Jun 18 '21 07:06 savely-krasovsky

@L11R, we have an e-mail for those: [email protected].

EugeneOne1 avatar Jun 18 '21 08:06 EugeneOne1

@L11R, we've recently committed some fixes for the DoH implementation. Could you also try the latest betas, like v0.107.0-b.4? They seem to fix issues for a lot of people who use DoH.

ainar-g avatar Jul 08 '21 11:07 ainar-g

@ainar-g I have already wanted to write here about those improvments! I installed b3 and b4 almost a week ago and by now I don't see any issues, will continue to observe.

savely-krasovsky avatar Jul 08 '21 11:07 savely-krasovsky

Oh, got this loop at Keenetic again.

I sent you the latest logs.

After restarting (systemctl restart AdGuardHome.service) Keenetic started to resolve domains again without its restart.

savely-krasovsky avatar Jul 12 '21 08:07 savely-krasovsky

@L11R, thanks, we've received the logs, although I cannot currently tell you when we'll be able to properly scan through it. In your personal estimate, has the issue at least become less frequent after our latest fix, or is it still as frequent as it was before?

ainar-g avatar Jul 15 '21 12:07 ainar-g

It less frequent, for sure. After the latest incident it still works.

savely-krasovsky avatar Jul 15 '21 12:07 savely-krasovsky

@ainar-g I found out that domain login.live.com cannot be resolved in my setup (again, Keenetic embedded DNS server -> external AdGuard Home server -> upstream DNS servers).

Home PC behind Keenetic DNS just getting timeouts, while at ADH side everything seems ok (at least logs reporting about successful query):

PS C:\Users\Savely> nslookup login.live.com
╤хЁтхЁ:  UnKnown
Address:  192.168.1.1

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.

For me it seems like Keenetic (and maybe other DNS servers) cannot handle such a big answers. I tried to compare DNS answers from AdGuard Home and directly from something like CF DoH. Results: изображение изображение As you can see the response in case of ADH is much larger.

After decoding I found that ADH returns 146 answer enties: изображение It could return less records sometime with the same request: изображение ...or even more: изображение For example cloudflare-dns.com everytime returns only 12 records: изображение You can test it yourself with this DNS-message: q80BAAABAAAAAAAABWxvZ2luBGxpdmUDY29tAAABAAE

I'm completely new to DNS, so I may have said something stupid, but I'm tired of this setup not working properly as daily driver :(

savely-krasovsky avatar Aug 11 '21 22:08 savely-krasovsky

Same problem with www.outlook.com (q80BAAABAAAAAAAAA3d3dwdvdXRsb29rA2NvbQAAAQAB). Also huge response size difference.

savely-krasovsky avatar Aug 11 '21 22:08 savely-krasovsky

@L11R I cannot reproduce it with 1.1.1.1 or 8.8.8.8, but there's definitely something going on with the way CNAMEs are resolved somewhere in the chain. Looking at the logs you've sent, I can see that the large responses often come from the 127.0.0.2:53 upstream. Are the large responses that you're receiving now also come from that upstream? Can you try using one of well-known upstreams and also set the cache size to zero to exclude any bad cached results?

ainar-g avatar Aug 12 '21 13:08 ainar-g

Hm, I will try.

savely-krasovsky avatar Aug 12 '21 13:08 savely-krasovsky

@ainar-g I changed it to 1.1.1.1 and 1.0.0.1 since you have highlighted it 4 days ago. I don't remember when I set it to local resolver... The problem with timeouts has gone. But I am still getting random NXDOMAIN from Keenetic, ADH logs seems to be fine. Keenetic logs are also clear now. After some beta update behavior is definitely changed.

I notice that ADH has a tendency to increase average processing time by the time. Currently it's 43. Yesterday was 29. Today it's already hard to use PC as usual. I am refreshing pages every 10 minutes at least to get them work (or work them properly with all assets).

savely-krasovsky avatar Aug 16 '21 22:08 savely-krasovsky

Have you reset the cache size to a non-zero value after setting the proper upstreams? Because if not, AGH is literally pinging the upstreams every time you make a request. If yes, try increasing it so that cache is used more effectively. You could also try and enabling the recently added optimistic caching mode.

ainar-g avatar Aug 17 '21 10:08 ainar-g

@ainar-g no, I kept it zero. But anyway it's strange, isn't it? Today I have literally 10 minutes of DNS not working at all. ~~Totally clean logs at Keenetic side~~ again getting randomly: Service: "DoT "System" UDP-to-TCP proxy #0": unexpectedly stopped. Logs from ADH will send by email.

savely-krasovsky avatar Aug 17 '21 11:08 savely-krasovsky

@L11R Hi! Sorry for such a long silence. Is this issue still relevant?

Birbber avatar Sep 02 '22 11:09 Birbber