MeshCentral icon indicating copy to clipboard operation
MeshCentral copied to clipboard

Mesh Agent Memory Leak

Open punacoder opened this issue 2 years ago • 15 comments

I am currently using the latest build of Mesh Central, and am still seeing memory leaks on the Mesh agent on my Windows Servers. Is there a fix for this? I see mentions of turning off the Plugin's option (which I do not have turned on).

punacoder avatar Jul 12 '22 21:07 punacoder

Can you provide some numbers on the leak? Indeed, there was a specific plug-in that would store lots of data in the MeshAgent database and cause issues. However, except for that plug-in, the agent should not leak.

Ylianst avatar Jul 12 '22 21:07 Ylianst

Starts at 27Mb and now it is 300Mb

On Tue, Jul 12, 2022, 11:56 AM Ylian Saint-Hilaire @.***> wrote:

Can you provide some numbers on the leak? Indeed, there was a specific plug-in that would store lots of data in the MeshAgent database and cause issues. However, except for that plug-in, the agent should not leak.

— Reply to this email directly, view it on GitHub https://github.com/Ylianst/MeshCentral/issues/4258#issuecomment-1182536914, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2A7NJRTRADA6MUBWRKLUUDVTXSYXANCNFSM53MLGO3A . You are receiving this because you authored the thread.Message ID: @.***>

punacoder avatar Jul 13 '22 05:07 punacoder

Btw..why r there 2 instances in memory for the Meshagent?

On Tue, Jul 12, 2022, 7:58 PM F Yoshimoto @.***> wrote:

Starts at 27Mb and now it is 300Mb

On Tue, Jul 12, 2022, 11:56 AM Ylian Saint-Hilaire < @.***> wrote:

Can you provide some numbers on the leak? Indeed, there was a specific plug-in that would store lots of data in the MeshAgent database and cause issues. However, except for that plug-in, the agent should not leak.

— Reply to this email directly, view it on GitHub https://github.com/Ylianst/MeshCentral/issues/4258#issuecomment-1182536914, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2A7NJRTRADA6MUBWRKLUUDVTXSYXANCNFSM53MLGO3A . You are receiving this because you authored the thread.Message ID: @.***>

punacoder avatar Jul 13 '22 06:07 punacoder

Can you please provide a bit more about your setup? Follow the new bug report template

Server os, meshcentral version, wan mode proxy etc

si458 avatar Jul 13 '22 07:07 si458

There should be one instance running, but when you open a agent KVM session, a second instance will show up. The first instance is managed by the Windows service manager. The second instance will run under the logged in user account. If you don't have a KVM session running, there are rare cases where a second agent could be running, but this is not typical.

Ylianst avatar Jul 13 '22 18:07 Ylianst

Did the latest fix for Windows Handle memory leak fix this issue? If so, what release was this applied to?

punacoder avatar Aug 02 '22 19:08 punacoder

Host server: Ubuntu 20.0.4 Docker image: Meshcentral release: 1.0.60

Issue: Multiple Windows based hosts (Windows Server 2016, Windows 10 Pro, Windows Server 2008) with Meshcentral agent installed will show Meshagent showing > 1GB (worst case > 5GB) of allocated memory. After ending the task with task manager the agent will creep up to that level after about 2 weeks.

punacoder avatar Aug 02 '22 19:08 punacoder

Ylian hasn't posted the agent update yet. Soon tho.

krayon007 avatar Aug 02 '22 19:08 krayon007

Thank you...just did a reset on all clients...was at 29MB initially...2 hours later already at 70MB.

punacoder avatar Aug 02 '22 22:08 punacoder

krayon007, Any update on the new agent to fix this issue?

punacoder avatar Aug 15 '22 23:08 punacoder

image Here is the latest task manager info on the agent after about a week.

punacoder avatar Aug 15 '22 23:08 punacoder

Yes, I fixed a whole bunch of leaks. I'm tracking one right now that isnt a leak per se, as the GC collects it, but only during a mark and sweep, which occurs infrequently. I'm working on a fix for it now. This is the last gating issue for release.

krayon007 avatar Aug 15 '22 23:08 krayon007

Thank you. I am surprised there are not more users complaining about the issue. It's crashed a couple of my windows application servers, due to memory low conditions.

punacoder avatar Aug 15 '22 23:08 punacoder

Any update?

punacoder avatar Aug 31 '22 18:08 punacoder

Any update?

the has been a new update 1.0.75 which including a new agent and possibly the memory leak fix give it a try agentupdate from the device console tab

si458 avatar Aug 31 '22 18:08 si458

Installed a meshcentral server on Amazon AWS, Debian bullseye 11.5, MC version 1.0.93(started with 1.0.85) in Pm2, behind an AWS application load balancer. Works fine, but the client is leaking memory on Windows server 2019 and windows 10 pro (version19043). On windows 2019 memory usage is growing pretty fast, about 300-500Mb a day. on windows 10 less. only 20-30Mb a day. The AWS application load balancer has an idle timeout of 60 seconds. changed agentpong and browserpong in config.json into 50 seconds and changed plugins enabled into false. Any suggestions what i can try?

ouddorp avatar Nov 01 '22 07:11 ouddorp

I'm seeing the same memory leak issue on windows server 2019. I upgraded to windows server 2022 and still the same issue. After the upgrade, I removed the agent and added again, but the memory leak is still there. I have to kill the agent every two weeks or so to reduce memory footprint.

markhuynh avatar Nov 21 '22 21:11 markhuynh

@markhuynh i do it with a small powershell script which is running daily on the scheduler. Restart the agent when it's using more than 100Mb ram.

$mesha = Get-Process "MeshAgent"
$mem=$mesha.PrivateMemorySize/1024/1024
write-host $mem
if ($mem -gt 100) {
    write-host "Restart MeshAgent, Memory usage > 100Mb"
    restart-service -name "Mesh Agent" -force
}
else {
    write-host "MeshAgent memory usage is fine."
}

ouddorp avatar Nov 22 '22 09:11 ouddorp

@markhuynh i do it with a small powershell script which is running daily on the scheduler. Restart the agent when it's using more than 100Mb ram. $mesha = Get-Process "MeshAgent" $mem=$mesha.PrivateMemorySize/1024/1024 write-host $mem if ($mem -gt 100) { write-host "Restart MeshAgent, Memory usage > 100Mb" restart-service -name "Mesh Agent" -force } else { write-host "MeshAgent memory usage is fine." }

i need to do same thing for my linux servers

r4lix avatar Nov 24 '22 09:11 r4lix

It looks like it's working fine now. The agentPong value is really important. I had it twice in my config.json file somehow. the first one with 300 seconds and the second with 50 seconds. So the first one of 300 seconds was actually active and since the AWS load balancer has a timeout of 60 seconds it results in memory issues of the agent. i think it's smart to have an agent restart script on the scheduler as a precaution anyway.

ouddorp avatar Jan 09 '23 14:01 ouddorp

Updated example of a Powershell script for restarting the mesh agent service. When there are active desktop connections there are more than one mesh agent process running. the example is with write-hosts to the console, but in production i write it with a function to a log file.

$MeshService = "Mesh Agent"
$MeshProcess = "MeshAgent"
$MeshServiceStatus = Get-Service -Name $MeshService -ErrorAction SilentlyContinue  
if ($MeshServiceStatus){
    write-host "Mesh Agent service exists"
    if ($MeshServiceStatus.Status -eq "running"){
        $MeshProcessStatus = Get-Process $MeshProcess 
        foreach ($process in $MeshProcessStatus) {
            $mem=$process.PrivateMemorySize/1024/1024
            write-host $mem
            if ($mem -gt 50) {
                write-host "Restart MeshAgent, Memory usage > 50Mb"
                try{
                    Restart-Service -Name $MeshService -ErrorAction 'Stop'
                }
                catch {
                    write-host "failed to restart service Mesh Agent" 
                }
            }
            else {
                write-host "MeshAgent memory usage is fine."
            }
        }
    }
    else{
        write-host "Mesh Agent is not running"
    }
}
else{
    write-host "Mesh Agent doesn't exists"
}

ouddorp avatar Jan 10 '23 14:01 ouddorp

image From one of my Windows Server 2019 DC, seems to top out at just under 2GB

Server version 1.1.5 Agent version 12:12:34, Dec 9 2022

Vista2003 avatar May 22 '23 16:05 Vista2003

On my hosts (200+) the problem is almost gone with setting the agent pong value in the json file to a value lower than the timeout of the webserver / load balancer. I only see the problem on servers where internet is down, but with a script like the above mentioned script the problem is under control.

ouddorp avatar May 23 '23 05:05 ouddorp

I wonder if it's a leak in the retry/connection timeout and re-establish of the session.

silversword411 avatar May 24 '23 00:05 silversword411

I wonder if it's a leak in the retry/connection timeout and re-establish of the session. yes i think there is a leak in the retry mechanism. I recommend a daily script. MC is besides the memory leak perfect. Currently we have 454 agents running. we have around 5 agents a day with memory usage greater than 50Mb. We know that because I've changed above mentioned script with an email log function.

ouddorp avatar May 24 '23 09:05 ouddorp

closing as stale, please try again with the latest version 1.1.20 and use node 18 or above, if the issue still persists, please reply back

si458 avatar Feb 17 '24 17:02 si458

Hello, I have the same problem with all my agent (windows and debian).

     Loaded: loaded (/lib/systemd/system/meshagent.service; bad; preset: enabled)
     Active: active (running) since Wed 2024-02-14 23:11:12 CET; 3 days ago
   Main PID: 392 (meshagent)
      Tasks: 1 (limit: 76903)
     Memory: 840.6M
        CPU: 18min 43.536s
     CGroup: /system.slice/meshagent.service
             └─392 /usr/local/mesh_services/meshagent/meshagent --installedByUser=0

févr. 14 23:11:12 WEB systemd[1]: Started meshagent.service - meshagent background service.

and after a reboot (it's a LXC) :

     Loaded: loaded (/lib/systemd/system/meshagent.service; bad; preset: enabled)
     Active: active (running) since Sun 2024-02-18 10:44:08 CET; 17s ago
   Main PID: 278554 (meshagent)
      Tasks: 1 (limit: 76903)
     Memory: 6.5M
        CPU: 510ms
     CGroup: /system.slice/meshagent.service
             └─278554 /usr/local/mesh_services/meshagent/meshagent --installedByUser=0

févr. 18 10:44:08 WEB systemd[1]: Stopped meshagent.service - meshagent background service.
févr. 18 10:44:08 WEB systemd[1]: meshagent.service: Consumed 18min 46.184s CPU time.
févr. 18 10:44:08 WEB systemd[1]: Started meshagent.service - meshagent background service.
root@WEB:~#

r4yzs avatar Feb 18 '24 09:02 r4yzs