HTTP3 upstream memory leak
RAM USAGE too much after running 2 hours.
systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-11-02 15:16:50 WIB; 2h 43min ago
Docs: https://github.com/AdguardTeam/dnsproxy#readme
Main PID: 27947 (dnsproxy-helper)
Tasks: 10 (limit: 2359)
Memory: 1.2G
CPU: 22min 44.463s
CGroup: /system.slice/adguard-dnsproxy.service
├─27947 /bin/sh /opt/adguard/dnsproxy-helper.sh start
└─27973 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml
python3 ps_mem.py
Private + Shared = RAM used Program
4.0 KiB + 0.5 KiB = 4.5 KiB dhclient
4.0 KiB + 0.5 KiB = 4.5 KiB unattended-upgr
4.0 KiB + 1.5 KiB = 5.5 KiB dnsproxy-helper
8.0 KiB + 1.0 KiB = 9.0 KiB agetty (2)
28.0 KiB + 8.5 KiB = 36.5 KiB dbus-daemon
32.0 KiB + 16.5 KiB = 48.5 KiB polkitd
44.0 KiB + 8.0 KiB = 52.0 KiB chronyd (2)
100.0 KiB + 5.5 KiB = 105.5 KiB systemd-udevd
132.0 KiB + 32.5 KiB = 164.5 KiB cron
108.0 KiB + 64.5 KiB = 172.5 KiB systemd-resolved
112.0 KiB + 78.5 KiB = 190.5 KiB systemd-logind
260.0 KiB + 18.5 KiB = 278.5 KiB packagekitd
364.0 KiB + 63.5 KiB = 427.5 KiB vnstatd
496.0 KiB + 30.5 KiB = 526.5 KiB rsyslogd
560.0 KiB + 229.5 KiB = 789.5 KiB systemd (3)
1.2 MiB + 118.5 KiB = 1.3 MiB bash
4.3 MiB + 679.5 KiB = 4.9 MiB nginx (3)
7.9 MiB + 225.5 KiB = 8.1 MiB systemd-journald
3.0 MiB + 6.2 MiB = 9.2 MiB sshd (5)
338.6 MiB + 0.5 KiB = 338.6 MiB nginx-rc
1.2 GiB + 0.5 KiB = 1.2 GiB dnsproxy
---------------------------------
1.5 GiB
=================================
so huge USAGE! should i setup with memoryMax?
Private + Shared = RAM used Program
4.0 KiB + 5.5 KiB = 9.5 KiB dnsproxy-helper
92.0 KiB + 26.5 KiB = 118.5 KiB systemd-timesyncd
132.0 KiB + 26.5 KiB = 158.5 KiB systemd-udevd
152.0 KiB + 13.5 KiB = 165.5 KiB agetty
204.0 KiB + 7.5 KiB = 211.5 KiB dbus-daemon
184.0 KiB + 37.5 KiB = 221.5 KiB cron
208.0 KiB + 86.5 KiB = 294.5 KiB systemd-logind
292.0 KiB + 29.5 KiB = 321.5 KiB rsyslogd
368.0 KiB + 67.0 KiB = 435.0 KiB sshd (2)
312.0 KiB + 128.5 KiB = 440.5 KiB systemd-journald
488.0 KiB + 22.5 KiB = 510.5 KiB vnstatd
280.0 KiB + 279.5 KiB = 559.5 KiB systemd (3)
1.0 MiB + 62.5 KiB = 1.1 MiB bash
34.5 MiB + 830.5 KiB = 35.3 MiB nginx (5)
591.7 MiB + 6.1 MiB = 597.8 MiB nginx-rc (7)
7.5 GiB + 0.5 KiB = 7.5 GiB dnsproxy
---------------------------------
8.1 GiB
=================================
We'll probably need to see the pprof dump to figure out what takes memory.
Also, from your report it is completely unclear what are the conditions you're running dnsproxy under are. Even what version is it.
Thanks for added pprof, i will update the issues later. i will says now over 100 devices need to handle. current setup like in https://github.com/malikshi/dnsproxy-systemd Systemd
[Unit]
Description=dnsproxy
Documentation=https://github.com/AdguardTeam/dnsproxy#readme
Before=network.target nss-lookup.target shutdown.target
Conflicts=shutdown.target
Wants=nss-lookup.target
[Service]
AmbientCapabilities=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE
ExecStart=!!/opt/adguard/dnsproxy-helper.sh start
ExecStop=!!/opt/adguard/dnsproxy-helper.sh stop
ProtectProc=invisible
ProtectHome=yes
Restart=always
RestartSec=0
WorkingDirectory=/run/dnsproxy
RuntimeDirectory=dnsproxy
LimitNPROC=512000
LimitNOFILE=infinity
#MemoryAccounting=yes
#MemoryMax=200M
#WatchdogSec=3min
[Install]
WantedBy=multi-user.target
dnsproxy.yml
---
http3: yes
upstream:
- "h3://https://freedns.controld.com/p2"
- "https://https://freedns.controld.com/p2"
- "quic://p2.freedns.controld.com"
fallback:
- "tls://p2.freedns.controld.com"
bootstrap:
- "1.1.1.1:53"
- "8.8.8.8:53"
- "9.9.9.9:53"
all-servers: yes
cache: yes
cache-optimistic: yes
edns: yes
bogus-nxdomain:
- "0.0.0.0"
- "::"
dnsproxy version v0.46.2.
What protocols do the client devices use?
systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-11-07 00:34:17 WIB; 6s ago
Docs: https://github.com/AdguardTeam/dnsproxy#readme
Main PID: 16915 (dnsproxy-helper)
Tasks: 11 (limit: 11628)
Memory: 15.6M
CPU: 1.005s
CGroup: /system.slice/adguard-dnsproxy.service
├─16915 /bin/sh /opt/adguard/dnsproxy-helper.sh start
└─16941 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof
the clients forced to use DNS port 53. Sir, what to do to export pprof data you needed?
I saw in your fork that you set "rmem_max" to 26214400 bytes (25 MB) for each socket, but shouldn't it be 2621440 bytes (2,5 MB). Could this be a reason for memory growing fast?!
sudo sh -c 'echo "net.core.rmem_max=26214400" >> /etc/sysctl.conf'
Also noticed your setup-file above shows:
- "h3://https://freedns.controld.com/p2
- "https://https://freedns.controld.com/p2"
...but shouldn't it just be:
- "h3://freedns.controld.com/p2"
- "https://freedns.controld.com/p2"
- "h3://https://freedns.controld.com/p2
- "https://https://freedns.controld.com/p2"
sorry i did typo, since i used the premium resolver from them so i changes it to public resolver.
I saw in your fork that you set "rmem_max" to 26214400 bytes (25 MB) MB for each socket, but shouldn't it be 2621440 bytes (2,5 MB). Could this be a reason for memory growing fast?!
sudo sh -c 'echo "net.core.rmem_max=26214400" >> /etc/sysctl.conf'
i following various tutorials, indeed i am set them to 25MB at the moment. is it causing the memory usage so high?
indeed i am set them to 25MB at the moment. is it causing the memory usage so high?
I read that DoQ want at least 2.048 Kb set in rmem_max to not show an error message, but @ameshkov probably knows the best :-)
It makes sense to do a quick test and remove HTTP3 upstreams just to see if it changes anything.
okay,
i removed parameter http3: yes and resolver h3 also DoQ, still with 25MB rmem_max to see the different.
i removed parameter http3: yes and resolver h3 also DoQ, still with 25MB rmem_max to see the different.
● adguard-dnsproxy.service - dnsproxy
Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-11-07 01:55:11 WIB; 11h ago
Docs: https://github.com/AdguardTeam/dnsproxy#readme
Main PID: 325 (dnsproxy-helper)
Tasks: 12 (limit: 11628)
Memory: 53.0M
CPU: 23min 33.931s
CGroup: /system.slice/adguard-dnsproxy.service
├─325 /bin/sh /opt/adguard/dnsproxy-helper.sh start
└─390 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof
@ameshkov do you still neeed pprof? how can i get the data from pprof?
python3 ps_mem
Private + Shared = RAM used Program
212.0 KiB + 50.5 KiB = 262.5 KiB dnsproxy-helper
312.0 KiB + 80.5 KiB = 392.5 KiB agetty
308.0 KiB + 236.5 KiB = 544.5 KiB cron
1.2 MiB + 384.5 KiB = 1.6 MiB dbus-daemon
820.0 KiB + 853.5 KiB = 1.6 MiB systemd-timesyncd
1.0 MiB + 825.5 KiB = 1.8 MiB systemd-logind
1.7 MiB + 202.5 KiB = 1.9 MiB vnstatd
1.7 MiB + 265.5 KiB = 2.0 MiB rsyslogd
2.1 MiB + 239.5 KiB = 2.3 MiB systemd-udevd
2.7 MiB + 194.5 KiB = 2.9 MiB bash
3.1 MiB + 3.3 MiB = 6.4 MiB sshd (2)
3.6 MiB + 4.5 MiB = 8.1 MiB systemd (3)
7.7 MiB + 745.5 KiB = 8.4 MiB systemd-journald
27.3 MiB + 2.4 MiB = 29.6 MiB nginx (5)
33.0 MiB + 48.5 KiB = 33.0 MiB dnsproxy
399.4 MiB + 15.4 MiB = 414.8 MiB nginx-rc (7)
---------------------------------
515.7 MiB
=================================
been running for 11 hour and the usage seem normal now. @iJorgen Can you please recommend me linux tuning setup(sysctl)
Can you please recommend me linux tuning setup(sysctl)
Hard to recommend settings for other setups/environments, but most important for DNSproxy is to give it some more UDP memory for DoQ/DoH3, which you already do.
@malikshi the only thing that makes sense to try is to keep the h3:// upstream, but don't get http3: yes back.
There's a difference in how DoH upstream work in these cases.
With h3:// it will use DoH3 right away and with http3: yes all DoH upstreams will run probe connections over TLS and QUIC in order to determine which protocol to use.
Please check if the memory leaks in this configuration.
@malikshi the only thing that makes sense to try is to keep the
h3://upstream, but don't gethttp3: yesback.There's a difference in how DoH upstream work in these cases.
With
h3://it will use DoH3 right away and withhttp3: yesall DoH upstreams will run probe connections over TLS and QUIC in order to determine which protocol to use.Please check if the memory leaks in this configuration.
updated config, here the result.
systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-11-07 14:32:24 WIB; 4h 21min ago
Docs: https://github.com/AdguardTeam/dnsproxy#readme
Main PID: 5831 (dnsproxy-helper)
Tasks: 17 (limit: 11628)
Memory: 778.3M
CPU: 40min 8.910s
CGroup: /system.slice/adguard-dnsproxy.service
├─5831 /bin/sh /opt/adguard/dnsproxy-helper.sh start
└─5858 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof
quite high usage compared to last tests.
systemctl status adguard-dnsproxy
● adguard-dnsproxy.service - dnsproxy
Loaded: loaded (/etc/systemd/system/adguard-dnsproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-11-07 14:32:24 WIB; 6h ago
Docs: https://github.com/AdguardTeam/dnsproxy#readme
Main PID: 5831 (dnsproxy-helper)
Tasks: 17 (limit: 11628)
Memory: 1.7G
CPU: 1h 6min 640ms
CGroup: /system.slice/adguard-dnsproxy.service
├─5831 /bin/sh /opt/adguard/dnsproxy-helper.sh start
└─5858 /opt/adguard/dnsproxy --listen=127.0.2.1 --config-path=/etc/adguard/dnsproxy.yml --pprof
increased overtime
Well, so it's confirmed that the problem is in the DoH3 upstream implementation.
We'll need a pprof dump then.
Can you build dnsproxy from my branch https://github.com/AdguardTeam/dnsproxy/tree/pprof ?
If you can, then please do the following:
- Run it for some time with
--pprofflag - Then when the memory usage is high enough, download
http://localhost:6060/debug/pprof/heapand share with me
Well, so it's confirmed that the problem is in the DoH3 upstream implementation.
We'll need a pprof dump then.
Can you build dnsproxy from my branch https://github.com/AdguardTeam/dnsproxy/tree/pprof ?
If you can, then please do the following:
- Run it for some time with
--pprofflag- Then when the memory usage is high enough, download
http://localhost:6060/debug/pprof/heapand share with me
Can pprof open more ports? Because I have three DNSproxies on one machine
@Potterli20 you can change the port here before building dnsproxy: https://github.com/AdguardTeam/dnsproxy/blob/master/main.go#L294
@Potterli20 you can change the port here before building dnsproxy: https://github.com/AdguardTeam/dnsproxy/blob/master/main.go#L294
It's a little tricky to compile one by one.😂
Well, so it's confirmed that the problem is in the DoH3 upstream implementation.
We'll need a pprof dump then.
Can you build dnsproxy from my branch https://github.com/AdguardTeam/dnsproxy/tree/pprof ?
If you can, then please do the following:
- Run it for some time with
--pprofflag- Then when the memory usage is high enough, download
http://localhost:6060/debug/pprof/heapand share with me
i compiled latest commit. and waiting for spike ram usage. i will follow up with pprof data later.
@Potterli20 you can change the port here before building dnsproxy: https://github.com/AdguardTeam/dnsproxy/blob/master/main.go#L294
You are advised to change pprof 127.0.0.1 to 0.0.0.0
This is pprof which I compiled based on 1.19 My upstream dns is based on the dns shunt file
@Potterli20 i think i need to compile like yours. i can't access pprof if running at localhost and dnsproxy running at server.
I found way to get heap data. but the usage still under 300MB, should i share it @ameshkov or wait more big leap.
curl -sK -v http://localhost:6060/debug/pprof/heap > heap.out
If there's a leak we'll see it from that heap dump just okay.
If there's a leak we'll see it from that heap dump just okay.
My dnsproxy program will update the upstream file and restart the program at 10:30 a.m. China time. Before the program can go to 1g more run, now smaller, change to 500m to 400m, so I do not understand the problem. It strikes me as having a lot to do with dependence.