inadyn icon indicating copy to clipboard operation
inadyn copied to clipboard

Very high memory usage

Open FallingSnow opened this issue 2 years ago • 8 comments

I'm running inadyn within a docker container and am seeing some unusually high memory usage. It's configured for only 6 cloudflare domains (3 using ipv6).

$ ps aux | grep inadyn
root     2532483  0.2  4.8 4022684 997664 ?      Ss   Aug18  78:30 /usr/sbin/inadyn --foreground
$ grep Vm /proc/2532483/status
17:VmPeak:	 4022004 kB
18:VmSize:	 4021896 kB
19:VmLck:	       0 kB
20:VmPin:	       0 kB
21:VmHWM:	 1001644 kB
22:VmRSS:	  997320 kB
26:VmData:	 4015544 kB
27:VmStk:	     132 kB
28:VmExe:	      48 kB
29:VmLib:	    2456 kB
30:VmPTE:	    7916 kB
31:VmSwap:	 1759296 kB

From within the container:

/ # inadyn --version
2.9.1
/ # uptime
 05:17:50 up 43 days,  7:42,  load average: 0.42, 0.69, 0.90

FallingSnow avatar Sep 12 '22 12:09 FallingSnow

Before and after stopping inadyn. ~1.3gb released.

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            19Gi        14Gi       961Mi       247Mi       4.2Gi       4.6Gi
Swap:          9.7Gi       6.1Gi       3.6Gi
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            19Gi        12Gi       2.3Gi       247Mi       4.2Gi       5.9Gi
Swap:          9.7Gi       4.4Gi       5.3Gi

FallingSnow avatar Sep 12 '22 12:09 FallingSnow

Yeah that doesn't look right ... even thought it's a bit hard sometimes to see what's actually being used and what's buffered by the kernel.

Unfortunately I'm not using the Cloudflare plugin myself (don't even have a login for the automatic tests, anyone able to help donate a login would be very appreciated), so I can't test or verify this. Maybe @SimonPilkington can shed some light on this apparent leak?

The Cloudflare plugin is possibly the most advanced in In-a-Dyn. It does a lot of malloc/free and even juggles json to figure stuff out. With that complexity it's not unlikely to be a leak in there.

troglobit avatar Sep 12 '22 18:09 troglobit

I'm afraid I have no insight to offer. I run inadyn on an embedded device and after 19 days of uptime its VSZ is less than 4MiB. Admittedly my setup is much simpler with only one IPv4 domain but I have no good way to investigate this further.

SimonPilkington avatar Sep 13 '22 10:09 SimonPilkington

@FallingSnow can consider running Inadyn outside of the container with the same config and under Valgrind assuming they are technically savvy to see if it finds any leaks.

SimonPilkington avatar Sep 13 '22 10:09 SimonPilkington

I don't currently have the time to look into this but if I do in the future I'll probably do as you said and run inadyn via Valgrind for a few days and see if memory usage is climbing.

@SimonPilkington are you using the cloudflare plugin?

FallingSnow avatar Sep 13 '22 17:09 FallingSnow

That sounds like a great plan. Someone who has the ability to run and test is imo best suited to run valgrind --leak-check=yes --leak-check=full inadyn ... to diagnose this.

@FallingSnow yeah Simon is the main author of that plugin :)

troglobit avatar Sep 13 '22 19:09 troglobit

Yes, I've been using it basically every day since my initial version made it into the RMerlin firmware. If there were leaks along any common path I believe I'd notice them blowing up my router. I did also run Valgrind with my config and cleaned up everything it found, but that was some time ago. If you do find anything feel free to report back.

SimonPilkington avatar Sep 14 '22 04:09 SimonPilkington

So I just checked it and inadyn was using 180MB of ram. So I've started it up using valgrind --leak-check=yes --leak-check=full inadyn --foreground inside the container. Will report back in about 1-2 weeks.

FallingSnow avatar Sep 23 '22 17:09 FallingSnow

Okay, It's been about 2 weeks.

$ ps aux | grep inadyn
root      505883  2.6  5.6 1455296 1143348 ?     Rs   Sep23 446:26 valgrind --leak-check=yes --leak-check=full inadyn --foreground
$ grep Vm /proc/505883/status 
17:VmPeak:	 1455328 kB
18:VmSize:	 1455328 kB
19:VmLck:	       0 kB
20:VmPin:	       0 kB
21:VmHWM:	 1143612 kB
22:VmRSS:	 1143600 kB
26:VmData:	 1446484 kB
27:VmStk:	     132 kB
28:VmExe:	    1444 kB
29:VmLib:	    2628 kB
30:VmPTE:	    2860 kB
31:VmSwap:	   50808 kB

And finally the good stuffs.

...
inadyn[1]: STOP command received, exiting.
==1== 
==1== HEAP SUMMARY:
==1==     in use at exit: 1,011,273,497 bytes in 556,964 blocks
==1==   total heap usage: 1,113,636,376 allocs, 1,113,079,412 frees, 231,191,705,460 bytes allocated
==1== 
==1== 184 bytes in 1 blocks are possibly lost in loss record 3 of 36
==1==    at 0x48A6BEE: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x49DE0C9: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x494C181: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x4975657: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x494C4AD: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493EF27: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493F301: gnutls_handshake (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114837: ssl_open (gnutls.c:237)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FD67: get_address_backend.constprop.0 (ddns.c:501)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1== 
==1== 16,895 bytes in 1 blocks are possibly lost in loss record 4 of 36
==1==    at 0x48A26D5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x4938090: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493087F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x4939B9F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493BB39: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493EF80: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493F301: gnutls_handshake (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114837: ssl_open (gnutls.c:237)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FA4F: get_address_backend.constprop.0 (ddns.c:481)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1== 
==1== 33,790 bytes in 2 blocks are possibly lost in loss record 5 of 36
==1==    at 0x48A26D5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x4938090: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493087F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x4939B9F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493BB39: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493EF80: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493F301: gnutls_handshake (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114837: ssl_open (gnutls.c:237)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FD67: get_address_backend.constprop.0 (ddns.c:501)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1== 
==1== 502,028,604 (89,152,624 direct, 412,875,980 indirect) bytes in 12,206 blocks are definitely lost in loss record 35 of 36
==1==    at 0x48A6BEE: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x495E460: gnutls_init (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114725: ssl_open (gnutls.c:205)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FD67: get_address_backend.constprop.0 (ddns.c:501)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1==    by 0x10E133: main (main.c:557)
==1== 
==1== 509,193,951 (89,934,152 direct, 419,259,799 indirect) bytes in 12,313 blocks are definitely lost in loss record 36 of 36
==1==    at 0x48A6BEE: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x495E460: gnutls_init (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114725: ssl_open (gnutls.c:205)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FA4F: get_address_backend.constprop.0 (ddns.c:481)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1==    by 0x10E133: main (main.c:557)
==1== 
==1== LEAK SUMMARY:
==1==    definitely lost: 179,086,776 bytes in 24,519 blocks
==1==    indirectly lost: 832,135,779 bytes in 532,439 blocks
==1==      possibly lost: 50,869 bytes in 4 blocks
==1==    still reachable: 73 bytes in 2 blocks
==1==         suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1== 
==1== For lists of detected and suppressed errors, rerun with: -s
==1== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)

FallingSnow avatar Oct 05 '22 05:10 FallingSnow

Ah, great stuff thanks!

I see you're using the GnuTLS backend. Maybe @SimonPilkington is using OpenSSL? That could explain why you see such different behavior.

This is so much easier to debug and gives a great idea of where the root cause to this lies, awesome!

troglobit avatar Oct 05 '22 07:10 troglobit

Indeed. The RMerlin firmware builds inadyn with OpenSSL and I also used OpenSSL when I was doing testing on x86. Will you be handling this then @troglobit?

SimonPilkington avatar Oct 05 '22 11:10 SimonPilkington

Yeah I'll have a look at it @SimonPilkington from the outside it looks like another case of different behavior between the two backends. Shouldn't be impossible to find, but I can't give any promises on timelines.

troglobit avatar Oct 05 '22 13:10 troglobit

@troglobit Just so you know I'm using https://hub.docker.com/r/troglobit/inadyn in case you want to try to reproduce the issue. Otherwise I can test.

FallingSnow avatar Oct 06 '22 01:10 FallingSnow

@FallingSnow is it the latest tag, and if so from which date, or is it the latest tag, v2.9.1?

I've got no login at Cloudflare to test with so I've tested some long-runs against FreeDNS with GnuTLS using the latest mainline inadyn. So far no leaks in sight :-/

troglobit avatar Oct 08 '22 06:10 troglobit

Latest tag. Created | 2022-09-23 10:20:36

That is the creation date of the container for which the leak report above is for.

I have this for the image:

Image details
ID 	sha256:a5bde0c73041823d2a4f36326b10be45eb49916d34283e623f9ea280f7142dc2
Size 	12.5 MB
Created 	2022-09-04 15:09:07
Build 	Docker on linux, amd64

FallingSnow avatar Oct 08 '22 17:10 FallingSnow

Sorry for the close, misclicked.

FallingSnow avatar Oct 08 '22 17:10 FallingSnow

There, I'm pretty sure I've found the root cause of the leak(s). We had a similar issue with the OpenSSL backend a year ago, and like this one, was caused by missing calls to close the session in case of various types of errors, e.g. DDNS server timeouts etc.

troglobit avatar Oct 09 '22 08:10 troglobit

Docker Hub container images updated.

If you have a chance to verify it, @FallingSnow, I'd be very thankful, because I think it's time to cut a new official release.

troglobit avatar Oct 09 '22 08:10 troglobit

Testing now.

FallingSnow avatar Oct 09 '22 23:10 FallingSnow

Any updates on the memory usage, @FallingSnow?

troglobit avatar Oct 21 '22 01:10 troglobit

Looks like the issue has been solved. Thanks for your work @troglobit.

$ ps aux | grep inadyn
root     2261324  0.6  0.0  13852  8272 ?        Ss   Oct09 109:24 inadyn --foreground
$ grep Vm /proc/2261324/status
17:VmPeak:	   14064 kB
18:VmSize:	   13852 kB
19:VmLck:	       0 kB
20:VmPin:	       0 kB
21:VmHWM:	   10980 kB
22:VmRSS:	    8272 kB
26:VmData:	    7180 kB
27:VmStk:	     132 kB
28:VmExe:	      48 kB
29:VmLib:	    2544 kB
30:VmPTE:	      64 kB
31:VmSwap:	     204 kB

FallingSnow avatar Oct 21 '22 20:10 FallingSnow

Awesome, thanks for reporting back! :heart:

Well then it appears it's time for a release! :sunglasses:

troglobit avatar Oct 22 '22 08:10 troglobit