dnsproxy icon indicating copy to clipboard operation
dnsproxy copied to clipboard

HTTP3 upstream memory leak

Open malikshi opened this issue 3 years ago • 23 comments

RAM USAGE too much after running 2 hours.

systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
     Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-11-02 15:16:50 WIB; 2h 43min ago
       Docs: https://github.com/AdguardTeam/dnsproxy#readme
   Main PID: 27947 (dnsproxy-helper)
      Tasks: 10 (limit: 2359)
     Memory: 1.2G
        CPU: 22min 44.463s
     CGroup: /system.slice/adguard-dnsproxy.service
             ├─27947 /bin/sh /opt/adguard/dnsproxy-helper.sh start
             └─27973 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml

python3 ps_mem.py
 Private  +   Shared  =  RAM used	Program

  4.0 KiB +   0.5 KiB =   4.5 KiB	dhclient
  4.0 KiB +   0.5 KiB =   4.5 KiB	unattended-upgr
  4.0 KiB +   1.5 KiB =   5.5 KiB	dnsproxy-helper
  8.0 KiB +   1.0 KiB =   9.0 KiB	agetty (2)
 28.0 KiB +   8.5 KiB =  36.5 KiB	dbus-daemon
 32.0 KiB +  16.5 KiB =  48.5 KiB	polkitd
 44.0 KiB +   8.0 KiB =  52.0 KiB	chronyd (2)
100.0 KiB +   5.5 KiB = 105.5 KiB	systemd-udevd
132.0 KiB +  32.5 KiB = 164.5 KiB	cron
108.0 KiB +  64.5 KiB = 172.5 KiB	systemd-resolved
112.0 KiB +  78.5 KiB = 190.5 KiB	systemd-logind
260.0 KiB +  18.5 KiB = 278.5 KiB	packagekitd
364.0 KiB +  63.5 KiB = 427.5 KiB	vnstatd
496.0 KiB +  30.5 KiB = 526.5 KiB	rsyslogd
560.0 KiB + 229.5 KiB = 789.5 KiB	systemd (3)
  1.2 MiB + 118.5 KiB =   1.3 MiB	bash
  4.3 MiB + 679.5 KiB =   4.9 MiB	nginx (3)
  7.9 MiB + 225.5 KiB =   8.1 MiB	systemd-journald
  3.0 MiB +   6.2 MiB =   9.2 MiB	sshd (5)
338.6 MiB +   0.5 KiB = 338.6 MiB	nginx-rc
  1.2 GiB +   0.5 KiB =   1.2 GiB	dnsproxy
---------------------------------
                          1.5 GiB
=================================

malikshi avatar Nov 02 '22 11:11 malikshi

so huge USAGE! should i setup with memoryMax?

 Private  +   Shared  =  RAM used	Program

  4.0 KiB +   5.5 KiB =   9.5 KiB	dnsproxy-helper
 92.0 KiB +  26.5 KiB = 118.5 KiB	systemd-timesyncd
132.0 KiB +  26.5 KiB = 158.5 KiB	systemd-udevd
152.0 KiB +  13.5 KiB = 165.5 KiB	agetty
204.0 KiB +   7.5 KiB = 211.5 KiB	dbus-daemon
184.0 KiB +  37.5 KiB = 221.5 KiB	cron
208.0 KiB +  86.5 KiB = 294.5 KiB	systemd-logind
292.0 KiB +  29.5 KiB = 321.5 KiB	rsyslogd
368.0 KiB +  67.0 KiB = 435.0 KiB	sshd (2)
312.0 KiB + 128.5 KiB = 440.5 KiB	systemd-journald
488.0 KiB +  22.5 KiB = 510.5 KiB	vnstatd
280.0 KiB + 279.5 KiB = 559.5 KiB	systemd (3)
  1.0 MiB +  62.5 KiB =   1.1 MiB	bash
 34.5 MiB + 830.5 KiB =  35.3 MiB	nginx (5)
591.7 MiB +   6.1 MiB = 597.8 MiB	nginx-rc (7)
  7.5 GiB +   0.5 KiB =   7.5 GiB	dnsproxy
---------------------------------
                          8.1 GiB
=================================

malikshi avatar Nov 06 '22 07:11 malikshi

We'll probably need to see the pprof dump to figure out what takes memory.

Also, from your report it is completely unclear what are the conditions you're running dnsproxy under are. Even what version is it.

ameshkov avatar Nov 06 '22 15:11 ameshkov

Thanks for added pprof, i will update the issues later. i will says now over 100 devices need to handle. current setup like in https://github.com/malikshi/dnsproxy-systemd Systemd

[Unit]
Description=dnsproxy
Documentation=https://github.com/AdguardTeam/dnsproxy#readme
Before=network.target nss-lookup.target shutdown.target
Conflicts=shutdown.target
Wants=nss-lookup.target

[Service]
AmbientCapabilities=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE
ExecStart=!!/opt/adguard/dnsproxy-helper.sh start
ExecStop=!!/opt/adguard/dnsproxy-helper.sh stop
ProtectProc=invisible
ProtectHome=yes
Restart=always
RestartSec=0
WorkingDirectory=/run/dnsproxy
RuntimeDirectory=dnsproxy
LimitNPROC=512000
LimitNOFILE=infinity
#MemoryAccounting=yes
#MemoryMax=200M
#WatchdogSec=3min

[Install]
WantedBy=multi-user.target

dnsproxy.yml

---
http3: yes
upstream: 
  - "h3://https://freedns.controld.com/p2"
  - "https://https://freedns.controld.com/p2"
  - "quic://p2.freedns.controld.com"
fallback: 
  - "tls://p2.freedns.controld.com"
bootstrap: 
  - "1.1.1.1:53"
  - "8.8.8.8:53"
  - "9.9.9.9:53"
all-servers: yes
cache: yes
cache-optimistic: yes
edns: yes
bogus-nxdomain:
  - "0.0.0.0"
  - "::"

dnsproxy version v0.46.2.

malikshi avatar Nov 06 '22 17:11 malikshi

What protocols do the client devices use?

ameshkov avatar Nov 06 '22 17:11 ameshkov

systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
     Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-07 00:34:17 WIB; 6s ago
       Docs: https://github.com/AdguardTeam/dnsproxy#readme
   Main PID: 16915 (dnsproxy-helper)
      Tasks: 11 (limit: 11628)
     Memory: 15.6M
        CPU: 1.005s
     CGroup: /system.slice/adguard-dnsproxy.service
             ├─16915 /bin/sh /opt/adguard/dnsproxy-helper.sh start
             └─16941 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof

the clients forced to use DNS port 53. Sir, what to do to export pprof data you needed?

malikshi avatar Nov 06 '22 17:11 malikshi

I saw in your fork that you set "rmem_max" to 26214400 bytes (25 MB) for each socket, but shouldn't it be 2621440 bytes (2,5 MB). Could this be a reason for memory growing fast?!

sudo sh -c 'echo "net.core.rmem_max=26214400" >> /etc/sysctl.conf'

Also noticed your setup-file above shows:

  • "h3://https://freedns.controld.com/p2
  • "https://https://freedns.controld.com/p2"

...but shouldn't it just be:

  • "h3://freedns.controld.com/p2"
  • "https://freedns.controld.com/p2"

iJorgen avatar Nov 06 '22 17:11 iJorgen

  • "h3://https://freedns.controld.com/p2
  • "https://https://freedns.controld.com/p2"

sorry i did typo, since i used the premium resolver from them so i changes it to public resolver.

I saw in your fork that you set "rmem_max" to 26214400 bytes (25 MB) MB for each socket, but shouldn't it be 2621440 bytes (2,5 MB). Could this be a reason for memory growing fast?!

sudo sh -c 'echo "net.core.rmem_max=26214400" >> /etc/sysctl.conf'

i following various tutorials, indeed i am set them to 25MB at the moment. is it causing the memory usage so high?

malikshi avatar Nov 06 '22 17:11 malikshi

indeed i am set them to 25MB at the moment. is it causing the memory usage so high?

I read that DoQ want at least 2.048 Kb set in rmem_max to not show an error message, but @ameshkov probably knows the best :-)

iJorgen avatar Nov 06 '22 18:11 iJorgen

It makes sense to do a quick test and remove HTTP3 upstreams just to see if it changes anything.

ameshkov avatar Nov 06 '22 18:11 ameshkov

okay, i removed parameter http3: yes and resolver h3 also DoQ, still with 25MB rmem_max to see the different.

malikshi avatar Nov 06 '22 18:11 malikshi

i removed parameter http3: yes and resolver h3 also DoQ, still with 25MB rmem_max to see the different.

● adguard-dnsproxy.service - dnsproxy
     Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-07 01:55:11 WIB; 11h ago
       Docs: https://github.com/AdguardTeam/dnsproxy#readme
   Main PID: 325 (dnsproxy-helper)
      Tasks: 12 (limit: 11628)
     Memory: 53.0M
        CPU: 23min 33.931s
     CGroup: /system.slice/adguard-dnsproxy.service
             ├─325 /bin/sh /opt/adguard/dnsproxy-helper.sh start
             └─390 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof

@ameshkov do you still neeed pprof? how can i get the data from pprof?

python3 ps_mem

 Private  +   Shared  =  RAM used	Program

212.0 KiB +  50.5 KiB = 262.5 KiB	dnsproxy-helper
312.0 KiB +  80.5 KiB = 392.5 KiB	agetty
308.0 KiB + 236.5 KiB = 544.5 KiB	cron
  1.2 MiB + 384.5 KiB =   1.6 MiB	dbus-daemon
820.0 KiB + 853.5 KiB =   1.6 MiB	systemd-timesyncd
  1.0 MiB + 825.5 KiB =   1.8 MiB	systemd-logind
  1.7 MiB + 202.5 KiB =   1.9 MiB	vnstatd
  1.7 MiB + 265.5 KiB =   2.0 MiB	rsyslogd
  2.1 MiB + 239.5 KiB =   2.3 MiB	systemd-udevd
  2.7 MiB + 194.5 KiB =   2.9 MiB	bash
  3.1 MiB +   3.3 MiB =   6.4 MiB	sshd (2)
  3.6 MiB +   4.5 MiB =   8.1 MiB	systemd (3)
  7.7 MiB + 745.5 KiB =   8.4 MiB	systemd-journald
 27.3 MiB +   2.4 MiB =  29.6 MiB	nginx (5)
 33.0 MiB +  48.5 KiB =  33.0 MiB	dnsproxy
399.4 MiB +  15.4 MiB = 414.8 MiB	nginx-rc (7)
---------------------------------
                        515.7 MiB
=================================

been running for 11 hour and the usage seem normal now. @iJorgen Can you please recommend me linux tuning setup(sysctl)

malikshi avatar Nov 07 '22 06:11 malikshi

Can you please recommend me linux tuning setup(sysctl)

Hard to recommend settings for other setups/environments, but most important for DNSproxy is to give it some more UDP memory for DoQ/DoH3, which you already do.

iJorgen avatar Nov 07 '22 06:11 iJorgen

@malikshi the only thing that makes sense to try is to keep the h3:// upstream, but don't get http3: yes back.

There's a difference in how DoH upstream work in these cases.

With h3:// it will use DoH3 right away and with http3: yes all DoH upstreams will run probe connections over TLS and QUIC in order to determine which protocol to use.

Please check if the memory leaks in this configuration.

ameshkov avatar Nov 07 '22 07:11 ameshkov

@malikshi the only thing that makes sense to try is to keep the h3:// upstream, but don't get http3: yes back.

There's a difference in how DoH upstream work in these cases.

With h3:// it will use DoH3 right away and with http3: yes all DoH upstreams will run probe connections over TLS and QUIC in order to determine which protocol to use.

Please check if the memory leaks in this configuration.

updated config, here the result.

systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
     Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-07 14:32:24 WIB; 4h 21min ago
       Docs: https://github.com/AdguardTeam/dnsproxy#readme
   Main PID: 5831 (dnsproxy-helper)
      Tasks: 17 (limit: 11628)
     Memory: 778.3M
        CPU: 40min 8.910s
     CGroup: /system.slice/adguard-dnsproxy.service
             ├─5831 /bin/sh /opt/adguard/dnsproxy-helper.sh start
             └─5858 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof

quite high usage compared to last tests.

systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
     Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-07 14:32:24 WIB; 6h ago
       Docs: https://github.com/AdguardTeam/dnsproxy#readme
   Main PID: 5831 (dnsproxy-helper)
      Tasks: 17 (limit: 11628)
     Memory: 1.7G
        CPU: 1h 6min 640ms
     CGroup: /system.slice/adguard-dnsproxy.service
             ├─5831 /bin/sh /opt/adguard/dnsproxy-helper.sh start
             └─5858 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof

increased overtime

malikshi avatar Nov 07 '22 11:11 malikshi

Well, so it's confirmed that the problem is in the DoH3 upstream implementation.

We'll need a pprof dump then.

Can you build dnsproxy from my branch https://github.com/AdguardTeam/dnsproxy/tree/pprof ?

If you can, then please do the following:

  1. Run it for some time with --pprof flag
  2. Then when the memory usage is high enough, download http://localhost:6060/debug/pprof/heap and share with me

ameshkov avatar Nov 08 '22 09:11 ameshkov

Well, so it's confirmed that the problem is in the DoH3 upstream implementation.

We'll need a pprof dump then.

Can you build dnsproxy from my branch https://github.com/AdguardTeam/dnsproxy/tree/pprof ?

If you can, then please do the following:

  1. Run it for some time with --pprof flag
  2. Then when the memory usage is high enough, download http://localhost:6060/debug/pprof/heap and share with me

Can pprof open more ports? Because I have three DNSproxies on one machine

Potterli20 avatar Nov 08 '22 15:11 Potterli20

@Potterli20 you can change the port here before building dnsproxy: https://github.com/AdguardTeam/dnsproxy/blob/master/main.go#L294

ameshkov avatar Nov 08 '22 15:11 ameshkov

@Potterli20 you can change the port here before building dnsproxy: https://github.com/AdguardTeam/dnsproxy/blob/master/main.go#L294

It's a little tricky to compile one by one.😂

Potterli20 avatar Nov 08 '22 15:11 Potterli20

Well, so it's confirmed that the problem is in the DoH3 upstream implementation.

We'll need a pprof dump then.

Can you build dnsproxy from my branch https://github.com/AdguardTeam/dnsproxy/tree/pprof ?

If you can, then please do the following:

  1. Run it for some time with --pprof flag
  2. Then when the memory usage is high enough, download http://localhost:6060/debug/pprof/heap and share with me

i compiled latest commit. and waiting for spike ram usage. i will follow up with pprof data later.

malikshi avatar Nov 08 '22 15:11 malikshi

@Potterli20 you can change the port here before building dnsproxy: https://github.com/AdguardTeam/dnsproxy/blob/master/main.go#L294

You are advised to change pprof 127.0.0.1 to 0.0.0.0

This is pprof which I compiled based on 1.19 My upstream dns is based on the dns shunt file

Potterli20 avatar Nov 09 '22 05:11 Potterli20

@Potterli20 i think i need to compile like yours. i can't access pprof if running at localhost and dnsproxy running at server.

I found way to get heap data. but the usage still under 300MB, should i share it @ameshkov or wait more big leap.

curl -sK -v http://localhost:6060/debug/pprof/heap > heap.out

malikshi avatar Nov 09 '22 07:11 malikshi

If there's a leak we'll see it from that heap dump just okay.

ameshkov avatar Nov 09 '22 08:11 ameshkov

If there's a leak we'll see it from that heap dump just okay.

My dnsproxy program will update the upstream file and restart the program at 10:30 a.m. China time. Before the program can go to 1g more run, now smaller, change to 500m to 400m, so I do not understand the problem. It strikes me as having a lot to do with dependence.

Potterli20 avatar Nov 09 '22 08:11 Potterli20