neolink icon indicating copy to clipboard operation
neolink copied to clipboard

Memory leakage in debian distr

Open pr0phe opened this issue 1 year ago • 43 comments

PID    PPID CMD                         %MEM
 85       1 /usr/local/bin/neolink rtsp 21.3

pic1

neolink_linux_x86_64_bullseye.zip

neolink.service - Neolink service Loaded: loaded (/etc/systemd/system/neolink.service; enabled; preset: enabled) Active: active (running) since Fri 2024-03-01 08:46:23 MSK; 29min ago Main PID: 85 (neolink) Tasks: 17 (limit: 9176) Memory: 563.4M CPU: 25.310s CGroup: /system.slice/neolink.service `-85 /usr/local/bin/neolink rtsp --config /usr/local/etc/neolink_config.toml

Mar 01 08:46:23 debian neolink[85]: [2024-03-01T05:46:23Z INFO neolink::utils] ha-cam: Connecting to camera at Add> Mar 01 08:46:23 debian neolink[85]: [2024-03-01T05:46:23Z INFO neolink_core::bc_protocol] ha-cam: Trying TCP disco> Mar 01 08:46:23 debian neolink[85]: [2024-03-01T05:46:23Z INFO neolink_core::bc_protocol] ha-cam: TCP Discovery su> Mar 01 08:46:23 debian neolink[85]: [2024-03-01T05:46:23Z INFO neolink::utils] ha-cam: Logging in Mar 01 08:46:23 debian neolink[85]: [2024-03-01T05:46:23Z INFO neolink::utils] ha-cam: Connected and logged in Mar 01 08:46:24 debian neolink[85]: [2024-03-01T05:46:24Z INFO neolink::common::camthread] ha-cam: Camera time is > Mar 01 08:46:24 debian neolink[85]: [2024-03-01T05:46:24Z INFO neolink::common::neocam] ha-cam: Model E1 Mar 01 08:46:24 debian neolink[85]: [2024-03-01T05:46:24Z INFO neolink::common::neocam] ha-cam: Firmware Version v> Mar 01 08:46:24 debian neolink[85]: [2024-03-01T05:46:24Z INFO neolink::rtsp] ha-cam: Avaliable at /ha-cam/main, /> Mar 01 08:46:24 debian neolink[85]: [2024-03-01T05:46:24Z INFO neolink::rtsp] ha-cam: Avaliable at /ha-cam/sub

pr0phe avatar Mar 01 '24 06:03 pr0phe

Debian GNU/Linux 12 \n \l

pr0phe avatar Mar 01 '24 06:03 pr0phe

Totally ive tryed to use 063 rc1 for bookworm and ubuntu It starts but in total ive got unable to open MRL 'rtsp://192.168.31.2:8554/ha-cam/mainStream/

config: bind = "0.0.0.0" tls_client_auth = "none"

[[cameras]] name = "name" username = "name" password = "pass" address = "192.168.31.83:9000" UID = "952700026MM7HB87" stream = "mainStream" format = "h264" discovery = "local"

also ive tryed 062 ubuntu release. ~~for hour ive not detecting memory leak, it seems it stable for any debian.~~ same problem

pr0phe avatar Mar 01 '24 08:03 pr0phe

happens for me too, debian 12.5 on proxmox. image image

tonytaylor85 avatar Mar 03 '24 16:03 tonytaylor85

at all ubuntu release same problem after two days.

oot@debian:/mnt/share# ps -eo pid,ppid,cmd,%mem --sort=-%mem | head PID PPID CMD %MEM 10719 1 /usr/local/bin/neolink mqtt 93.1 10704 234 /usr/sbin/smbd --foreground 0.2 10871 353 ps -eo pid,ppid,cmd,%mem -- 0.1 234 1 /usr/sbin/smbd --foreground 0.1 1 0 /sbin/init 0.1 10703 1 /lib/systemd/systemd-journa 0.1 102 1 dhclient -4 -v -i -pf /run/ 0.1 107 1 /lib/systemd/systemd-networ 0.1 195 1 /usr/sbin/nmbd --foreground 0.1

@QuantumEntangledAndy do you have and original release made by thirtythreeforty, id like to find the verison from wich we have a problem. Problem valid not only for proxmox instances.

seems like 0517 ubuntu have no problem with memory leakage

pr0phe avatar Mar 04 '24 05:03 pr0phe

I have the same problem. My workaround is to run Neolink in an LXC, set the memory limit as low as possible and set Neolink to autostart at LXC boot. That means that PVE auto kills the LXC as soon as the memleak happens and Neolink is restarted. Not great, but it works. I have had the exact same issue with the original (thirtythreeforty) version as well.

tomaspre avatar Mar 14 '24 10:03 tomaspre

ave the same problem. My workaround is to run Neolink in an LXC, set the memory limit as low as possible and set Neolink to autostart at LXC boot. That means that PVE auto kills the LXC as soon as the memleak happens and Neolink is restarted. Not great, but it works. I have had the exact same issue with the original (thirtythreeforty) version as well.

I used 0517 version, it works stable without memory leakage more then week uptime

pr0phe avatar Mar 14 '24 10:03 pr0phe

Anyone good at using valgrind? I can do memory profiling on my MacBook and I'm not seeing this there. So I think I'm going to have to do the memory profiling on a Linux with valgrind instead

I'm wondering where this is happening since I use rust code I can't do the usual forgetting to free memory since all memory is managed. I might have an unbound vector or something but I am not sure where it would be. Will need digging

QuantumEntangledAndy avatar Mar 19 '24 11:03 QuantumEntangledAndy

Since I'm on a Debian machine, running Neolink in Valgrind should not be a problem for me. I'm using Valgrind extensively for my C(++) projects but I had no clue you can use it for Rust. I'll give it a go ASAP.

tomaspre avatar Mar 19 '24 11:03 tomaspre

Thanks, perhaps you can send me your valgrind command too. So I can try it with a debug build with the symbols loaded

QuantumEntangledAndy avatar Mar 19 '24 12:03 QuantumEntangledAndy

I managed to get a look at this in valgrind in a Debian bookworm docker image. All the memory leaks seem to be happening upstream in gstreamer. I wonder if this is why I didn't seem them in my macOS since the mac is using a more up to date gstreamer than what is provided in bookworm. I'm going to try and run the valgrind on my arch Linux with a more recent gstreamer. This might also explain why it worked before since I updated the docker Debian images to bookworm in a recent update.

QuantumEntangledAndy avatar Apr 07 '24 11:04 QuantumEntangledAndy

More development, but perhaps not the news you wanted. I managed to work out valgrind to visualse the memory better and it dosen't seem to be leaking with me

Camera and Client Setup

  • E1
  • Substream
  • Connected over RTSP with ffmpeg for 27 minutes

Debian Bookworm gstreamer1.0-x (1.22.0-3+deb12u1)

Screenshot 2024-04-08 at 15 16 17

Debian Sid gstreamer1.0-x (1.24.1-1 and others)

Screenshot 2024-04-08 at 15 16 31

It seems to be stable at 4MB. Are there any more setup details that could help me find out what is causing this. How are you connecting etc?

QuantumEntangledAndy avatar Apr 08 '24 08:04 QuantumEntangledAndy

Ive installed proxmox ve, then used Debian LXC from https://tteck.github.io/Proxmox/ Then installed neolink as per yr instuctions with following config:

bind = "0.0.0.0"

[[cameras]] name = "ha-cam" username = "" password = "" address = "192.168..:9000" UID = "9527000****"

pr0phe avatar Apr 09 '24 17:04 pr0phe

Can anyone test latest, we found a potential place that would cause the leak and I have tried to address this, but needs testing

QuantumEntangledAndy avatar Apr 26 '24 07:04 QuantumEntangledAndy

I just built e8aca0ee105a2869740f93cfb238d19baf9a3362 and got it running I have 3 B800s and 2 B400s now so it gets a workout.

tonytaylor85 avatar Apr 26 '24 10:04 tonytaylor85

Glad you got it to build. We also build on github in actions so you can download test builds. The build for the commit you referenced for example is here https://github.com/QuantumEntangledAndy/neolink/actions/runs/8846439437

QuantumEntangledAndy avatar Apr 26 '24 11:04 QuantumEntangledAndy

Oh, nice. I'll download that and run it, since it is still leaking pretty bad image

tonytaylor85 avatar Apr 26 '24 11:04 tonytaylor85

no luck, the build you linked also leaks at the same rate. I suppose that was expected but I'm new at this.

tonytaylor85 avatar Apr 26 '24 12:04 tonytaylor85

Interesting. Could you try this build https://github.com/QuantumEntangledAndy/neolink/actions/runs/8844853438 it should have some heavy logging where it reports the sizes of all the buffers. Can you see if any of the buffers have a silly size

QuantumEntangledAndy avatar Apr 26 '24 14:04 QuantumEntangledAndy

I'd seen some notes in journalctl with e8aca0ee105a2869740f93cfb238d19baf9a3362 Two minutes into a boot-

Apr 26 11:47:05 neolink62 neolink[469]: [2024-04-26T15:47:05Z INFO  neolink::rtsp::factory] buffer_size: 1572864, bitrate: 6291456
Apr 26 11:47:05 neolink62 neolink[469]: [2024-04-26T15:47:05Z INFO  neolink::rtsp::stream] Buffer full on audsrc
Apr 26 11:47:05 neolink62 neolink[469]: [2024-04-26T15:47:05Z INFO  neolink::rtsp::stream] Buffer full on vidsrc
Apr 26 11:47:05 neolink62 neolink[469]: [2024-04-26T15:47:05Z INFO  neolink::rtsp::factory] buffer_size: 1572864, bitrate: 6291456
Apr 26 11:47:05 neolink62 neolink[469]: [2024-04-26T15:47:05Z INFO  neolink::rtsp::stream] Buffer full on vidsrc
Apr 26 11:47:05 neolink62 neolink[469]: [2024-04-26T15:47:05Z INFO  neolink::rtsp::stream] Buffer full on audsrc
Apr 26 11:47:06 neolink62 neolink[469]: [2024-04-26T15:47:06Z INFO  neolink::rtsp::stream] Buffer full on audsrc
Apr 26 11:47:06 neolink62 neolink[469]: [2024-04-26T15:47:06Z INFO  neolink::rtsp::stream] Buffer full on audsrc
Apr 26 11:47:07 neolink62 neolink[469]: [2024-04-26T15:47:07Z INFO  neolink::rtsp::stream] Buffer full on vidsrc

and on and on.

I've attached journalctl -u neolink.service -b -e > logs.txt from a8b93a4ef2b3bd9c65fc3b33e7e1bf22f5bb5988

logs.txt

tonytaylor85 avatar Apr 26 '24 16:04 tonytaylor85

and starting it from the terminal termrun.txt

tonytaylor85 avatar Apr 26 '24 20:04 tonytaylor85

Also seeing leaks here that seem to be caused when a person detection even occurs. I don't have MQTT or anything else enabled, am just streaming the video.

Using latest docker image on arm64: quantumentangledandy/neolink latest c67dc8fc51fc 25 hours ago 706MB

image

JulesAU avatar May 03 '24 03:05 JulesAU

Any chance I can get access to a camera to check this out. In the other threads on the memory leaks it seems to be fixed already.

I am not sure what you mean by person detection events since we are not listening to those messages. Can you be more specific, is this something from post processing you add, the pir or something else?

QuantumEntangledAndy avatar May 03 '24 03:05 QuantumEntangledAndy

It's a Reolink Lumus v2. The camera triggers alerts to the Reolink app on my phone when it detects someone. Neolink, as you've pointed out, doesn't/shouldn't have anything to do with that. But it does seem to correspond with the jumps in RSS on the neolink process.

I'll message you with VPN credentials if you want to take a look

JulesAU avatar May 03 '24 03:05 JulesAU

Ah the push notfications, we do listen to those as part of the wakeup on motion (since we cannot stay connected on battery cameras to listen to it without draining the battery). Perhaps you can add push_notifications = false to every [[cameras]] and see if it still happens. That would let me narrow it down

QuantumEntangledAndy avatar May 03 '24 04:05 QuantumEntangledAndy

In the push notifications we have a vector of recieved push IDs perhaps the IDs are not being filtered out and the list of IDs is filling endlessly with the same ID. I'll make it a hashset instead

QuantumEntangledAndy avatar May 03 '24 04:05 QuantumEntangledAndy

Ok I've added push_notifications = false.

I also had idle_disconnect = true in there (which didn't appear to do anything; this is not a battery powered cam. I've disabled that flag also.

JulesAU avatar May 03 '24 04:05 JulesAU

Also just noticed in the logs quite a few of these:

(neolink:8): GStreamer-CRITICAL **: 04:09:03.917: gst_poll_write_control: assertion 'set != NULL' failed
(neolink:8): GStreamer-CRITICAL **: 04:09:03.917: gst_poll_write_control: assertion 'set != NULL' failed
(neolink:8): GStreamer-CRITICAL **: 04:09:03.983: gst_poll_write_control: assertion 'set != NULL' failed
(neolink:8): GStreamer-CRITICAL **: 04:09:03.983: gst_poll_write_control: assertion 'set != NULL' failed
(neolink:8): GStreamer-CRITICAL **: 04:09:03.983: gst_poll_write_control: assertion 'set != NULL' failed

JulesAU avatar May 03 '24 04:05 JulesAU

Not sure about those, they are happening in the gstreamer part of the code so its always hard tto diagnose. I quick google of that suggests too many open file handlles. I set the file handle limit in docker to 1024, maybe that too low?

QuantumEntangledAndy avatar May 03 '24 04:05 QuantumEntangledAndy

Sent you an email with VPN details.

JulesAU avatar May 03 '24 04:05 JulesAU

Yeah just logged in. I am not seeing those gstreamer errors outside of docker so perhaps its the limit on fds

QuantumEntangledAndy avatar May 03 '24 04:05 QuantumEntangledAndy