neolink icon indicating copy to clipboard operation
neolink copied to clipboard

Neolink Memory Leak / Errors

Open KSti56 opened this issue 1 year ago • 18 comments

Describe the bug This software is really amazing, but unfortunately I've been having a lot of issues with it, specifically with memory leaks. Memory usage is at 3.3G after just an hour. I have read a couple of the issues pertaining to memory leaks (like this one) and it seems like the main recommendation was to update. I updated a couple weeks ago (currently on b37b3ee4b753e7bd0c28ec8ca04c16e395e76b21 release). Unfortunately, memory leaks continue and the logs are full of errors. I'm no expert at this stuff, so any help would be greatly appreciated.

Additionally, I checked today and saw a bunch of these errors:

gst_poll_read_control: assertion 'set != NULL' failed
INFO  neolink::rtsp::stream] Buffer full on audsrc
INFO  neolink::rtsp::stream] Buffer full on vidsrc

To Reproduce No specific steps to reproduce. I believe it has been happening since installation.

Expected behavior No memory leaks (I would expect maybe 200-300MB of usage?).

Versions NVR software: BlueIris v5.9.3.4 Neolink software: b37b3ee4b753e7bd0c28ec8ca04c16e395e76b21 Reolink camera model and firmware: 3 B800s -- Firmware on 2 of them is v3.0.0.82_20080600 and the other one is v3.0.0.183_21012800

Service Info

KSti56 avatar Jul 22 '24 19:07 KSti56

Well aware of this issue, (multiple open issues too) I've used valgrind, massif, and all sorts of other programs to track it. Can't seem to squash this one

Also Buffer full on audsrc means that your client (VLC or whatever) is not pulling the frames and the gstreamer buffer is full, this is mostly harmless as it just means we stop sending frames to gstraemer for a while.

The program kind of works like this

  • Neolink
    • Gets frames from camera
    • Buffers for reorganising packets (maximum of about 200 frames)
    • Buffers for paused playback (15s of frames)
    • Hand frames to gstreamer
  • Gstreamer
    • Gets frames from neolink
    • Has its own buffer to hold the frames
    • Waits for right time/request from rtsp client to deliver frames

Most of the apps memory allocations happen in the gstreamer part which I don't have direct control over. I'll keep looking when I can but nothing seems to be obvious

QuantumEntangledAndy avatar Aug 03 '24 08:08 QuantumEntangledAndy

Thanks for the reply! I really appreciate your work on this project. Sorry about the duplicate issue, I didn't see any other ones relating to this exact issue (but now that I went through the older issues, I see what you are referring to). I'm glad to hear that you're aware of the issue and trying to fix it, I know it's probably frustrating that you can't find the root cause. I'll keep monitoring issues with hopes of an eventual fix.

As for the Buffer full on audsrc, I'm using BlueIris. Is there a certain configuration parameter that I can adjust to fix this? I have noticed that the video feed (and therefore recordings) cut out somewhat often, which I assume is related to this.

And as for the memory leak issue, I assume the only temporary fixes are just either allocating more memory and/or setting up automatic restarts every x hours?

Thanks again for the help and work on this project! Let me know if there is any information that would be helpful for the debugging/troubleshooting process.

KSti56 avatar Aug 03 '24 20:08 KSti56

I've seen memory leaks when running with 2 x RLC-CX810. If it helps troubleshooting, I've reverted back to image: quantumentangledandy/neolink:v0.5.17 which has been running consistently at about 120MB for several days now. So it would suggest changes after this version has caused the leak. I didn't test every image between but this was the first one that worked consistently.

RutgerDiehard avatar Aug 18 '24 06:08 RutgerDiehard

For those using neolink in a docker configuration here is a docker compose work around for the memory leak:

...
  neolink:
    container_name: neolink
    restart: always
    deploy:
      resources:
        limits:
          memory: 512M
    image: quantumentangledandy/neolink
...

also i came up with this quick bash script for restarting the container when the process itself takes up too much memory:

#!/bin/bash

# The name of the Docker container
CONTAINER_NAME="neolink"

# Define the memory threshold (in percentage)
THRESHOLD=20

# Get the total system memory in kilobytes
TOTAL_MEM=$(grep MemTotal /proc/meminfo | awk '{print $2}')

# Function to calculate the total memory usage of processes with the word "neolink"
get_neolink_mem_usage() {
    ps aux | grep -i "neolink" | grep -v "grep" | awk '{mem+=$6} END {print mem}'
}

# Get the current memory usage of neolink processes
NEOLINK_MEM=$(get_neolink_mem_usage)

# Convert the memory usage to percentage
MEM_PERCENT=$(echo "$NEOLINK_MEM $TOTAL_MEM" | awk '{printf "%.2f", ($1/$2)*100}')

# Restart the Docker container if memory usage exceeds the threshold
if (( $(echo "$MEM_PERCENT > $THRESHOLD" | bc -l) )); then
    echo "Memory usage of neolink processes is $MEM_PERCENT%, which exceeds the threshold of $THRESHOLD%"
    echo "Restarting the Docker container $CONTAINER_NAME..."
    docker restart $CONTAINER_NAME
else
    echo "Memory usage of neolink processes is $MEM_PERCENT%, which is within the safe limit."
fi

You could probably use a chron job to run this script every minute and that should also work.

Pytonballoon810 avatar Sep 18 '24 18:09 Pytonballoon810

I've seen memory leaks when running with 2 x RLC-CX810. If it helps troubleshooting, I've reverted back to image: quantumentangledandy/neolink:v0.5.17 which has been running consistently at about 120MB for several days now. So it would suggest changes after this version has caused the leak. I didn't test every image between but this was the first one that worked consistently.

I switched to this Docker version, and while it did seem to be running pretty smoothly, it eventually crashed after 7-8 hours (I think it was a memory leak issue, but I wasn't monitoring it). Interestingly, I also received these errors right when the recording stopped. I haven't seen these messages before, so if anyone has any ideas of what they mean, I would appreciate the help.

neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel
neolink_core::bc_protocol::connection::bcconn] Remaining: 0 of 100 message space for 4 (ID: 3)
neolink::rtsp] cam19: Join Pause
neolink::rtsp] cam19: Retryable error: Timed out waiting to send Media Frame (Caused by: deadline has elapsed)

KSti56 avatar Sep 20 '24 15:09 KSti56

I've seen memory leaks when running with 2 x RLC-CX810. If it helps troubleshooting, I've reverted back to image: quantumentangledandy/neolink:v0.5.17 which has been running consistently at about 120MB for several days now. So it would suggest changes after this version has caused the leak. I didn't test every image between but this was the first one that worked consistently.

I switched to this Docker version, and while it did seem to be running pretty smoothly, it eventually crashed after 7-8 hours (I think it was a memory leak issue, but I wasn't monitoring it). Interestingly, I also received these errors right when the recording stopped. I haven't seen these messages before, so if anyone has any ideas of what they mean, I would appreciate the help.

neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel
neolink_core::bc_protocol::connection::bcconn] Remaining: 0 of 100 message space for 4 (ID: 3)
neolink::rtsp] cam19: Join Pause
neolink::rtsp] cam19: Retryable error: Timed out waiting to send Media Frame (Caused by: deadline has elapsed)

Although quantumentangledandy/neolink:0.5.17 ran with no memory issues and I didn't have to babysit it, I wasn't able to run Frigate with clean logs. I would get regular stream errors and FFMPEG crashes from multiple cameras. So, I set about testing new versions of neolink to see if it would solve the Frigate errors. quantumentangledandy/neolink:0.5.18 spammed the container logs with messages to do with encryption, so I tried quantumentangledandy/neolink:0.6.0. This, for the last 12 hours, has been perfectly stable, Frigate shows no stream errors and no FFMPEG crashes, and it runs with the resource shown below. This is with 4 x 4K cameras (2 x Reolink RLC-811a, 2 x RLC-CX810) and a RLC-410 via a Reolink NVR. image

I have limited the resources available to neolink and Frigate in the Portainer stack (which runs on bare-metal Ubuntu Server 24.04) just so they don't impact other containers I use on the system. So far, I'm very happy how it's running.

RutgerDiehard avatar Oct 02 '24 09:10 RutgerDiehard

Testing with 0.5.17, 0.6.0, 0.6.3rc2, using 15x D800 Reolink cameras: image

Hosted on a generic ubuntu22.02 LXC on proxmox Forced to use a watchdog process to terminate and restart the service at 75% total memory consumption

jmoney7823956789378 avatar Oct 16 '24 19:10 jmoney7823956789378

Adding myself to the users who have this buffer problem. I run neolink with 5 cameras (B800 and D800) and can't get it to work properly with frigate. Main reason why I want neolink is that NVR doesn't provide Main stream for cameras via rtsp/https

federicotravaini avatar Dec 28 '24 14:12 federicotravaini

Edit: I did some more digging and realized that my PR just reverts an attempted fix from August 17 (so, after 0.6.3rc2). So it's unlikely to fix the underlying issues above, but if someone here (@federicotravaini?) happens to be running master, it might at least improve things.

cincodenada avatar Dec 29 '24 09:12 cincodenada

Having similar issues with a standard debian docker install on a RPI4. image

My Lumus (firmware v2.0.0.705) is remotely mounted. Wifi is poor. Resource usage is pretty stable with good connection, but will eat memory and CPU if the camera repeatably disconnects, or the user account gets locked,

Testing out an automation in HA to remotely restarting the docker, triggered when memory usage gets too high.

Would try to help, but loos like this is a known issue and my rust is "rusty" ;)

Update 1/28/25: Noticed that network traffic is also increacing. May indicate that failed session are still active and continue receiving data.

wizmo2 avatar Jan 02 '25 13:01 wizmo2

@QuantumEntangledAndy Andy -- could there be some sort of band-aid applied for this until the leak is resolved? Or, come at it from a different angle?

For example --

  • internally manage gstreamer subprocess instances. Kill them and transition to new ones in a way that doesn't end user sessions.
  • Or, switch from gstreamer to go2RTC or another solution

This is a great project, but this problem is a show stopper. Countless hours of good work you and others have put into this project, to have this one thing make it nearly unusable! I mean, look at the graph jmoney shared a few posts above. He's having to restart the process every hour. I have a server with 512GB of RAM and Neolink will consume darn near all of it within 24 hours. It's real serious!!

keithkmyers avatar Jan 29 '25 16:01 keithkmyers

go2RTC will not work in the way that you seem to suggest. 1. It's written in go, so not directly integrate with this rust project, 2. it still requires the source to be in a somewhat usable format rtsp or rtp or something so gstreamer would still be required to handle it.

Internally managing it is also not something I'd want to integrate, there are dedicated watchdog programs for such things

QuantumEntangledAndy avatar Jan 30 '25 04:01 QuantumEntangledAndy

I've been having similar problems with a Proxmox LXC docker container connected to one Lumus camera. Reading the comment by @RutgerDiehard above I thought I would give v0.6.0 a go. It's been running continuously for 12 hours with consistent output throughout. With the newer versions it was getting to 1GB within a couple of hours and continuing to rise.

Image

mrspouse avatar Feb 01 '25 10:02 mrspouse

I'm having this issue with 2 Argus Pro's. The memory steadily climbs. Starting at 125mb, and after just 10 minutes it's at 612mb and climbing. Using docker latest version tag, with descriptor "version=0.6.3-rc.3"

Nothing in logs, last log entry was INFO neolink::rtsp::factory] Buffer full on vidsrc pausing stream until client consumes frames but that was right when Blue Iris connected.

Am open to testing potential fixes and reporting back.

TheOneOgre avatar Mar 04 '25 03:03 TheOneOgre

If you need someone to do quick testing on this throw me a highlight as I am happy to mess with this and report back fast, as I have 5 cameras going through it and the memory leak happens rapidly, on latest, but v0.6.2 does indeed seem to solve the problem, 0.6.0 and 0.6.1 have other issues

jaddie avatar Apr 23 '25 17:04 jaddie

Just a quick check on usage for those having issues. I use both RTSP streaming and Snapshot function.

wizmo2 avatar Apr 23 '25 23:04 wizmo2

Hi

I have had some findings on this matter. I also had the issue of massive memory leaks within the docker container with just one camera connected. My speculation was that gstreamer might be responsible for the memory leaks after reading the above comments, so I went to their website's release page and saw that in release 1.26 several memory leaks were fixed.

The Dockerfile however uses a debian bookworm image which only offers 1.22 as the latest version. Version 1.26 comes packaged for trixie, however. So I simply changed the Dockerfile:

  • For the build image change rust:slim-bookworm to rust:slim-trixie
  • The user image I changed debian:bookworm-slim to debian:trixie-slim
--- a/Dockerfile
+++ b/Dockerfile
@@ -4,7 +4,7 @@
 #                    Miroslav Šedivý
 # SPDX-License-Identifier: AGPL-3.0-only

-FROM docker.io/rust:slim-bookworm AS build
+FROM docker.io/rust:slim-trixie AS build
 ARG TARGETPLATFORM

 ENV DEBIAN_FRONTEND=noninteractive
@@ -41,12 +41,12 @@ RUN  echo "TARGETPLATFORM: ${TARGETPLATFORM}"; \
           libgtk2.0-dev \
           protobuf-compiler \
           libglib2.0-dev && \
-        apt-get clean -y && rm -rf /var/lib/apt/lists/* ; \
+        apt-get clean -y ; \
     cargo build --release; \
   fi

 # Create the release container. Match the base OS used to build
-FROM debian:bookworm-slim
+FROM debian:trixie-slim
 ARG TARGETPLATFORM
 ARG REPO
 ARG VERSION
@@ -73,7 +73,7 @@ RUN apt-get update && \
         gstreamer1.0-plugins-good \
         gstreamer1.0-plugins-bad \
         gstreamer1.0-libav && \
-    apt-get clean -y && rm -rf /var/lib/apt/lists/*
+    apt-get clean -y

 COPY --from=build \
   /usr/local/src/neolink/target/release/neolink \

Then I rebuilt the docker image. I have been using it for a couple of weeks now without issue. So it seems to fix the leak for me. Maybe this can be of help for someone else.

If someone else can confirm that this fixes the problem, I'm happy to supply a PR.

sonntam avatar Sep 15 '25 21:09 sonntam

If someone else can confirm that this fixes the problem, I'm happy to supply a PR.

Am testing now on an arm64 build (rpi4). Initially it looks more stable, but will report back in due coarse

Edit 10/15/24: Initially the memory leak on my application appears to be more severe on Trixie, but saying, with my existing HA automation, the feed seems to provide a more stable feed,

Image

wizmo2 avatar Oct 13 '25 01:10 wizmo2