pm2 icon indicating copy to clipboard operation
pm2 copied to clipboard

PM2 v5.1.0: God Daemon taking up huge amounts of memory

Open victor-ono opened this issue 3 years ago • 27 comments

https://github.com/Unitech/pm2/issues/1126

I'm started experiencing this issue again after updating to 5.1.

It takes 230MB (~50%) of RAM on a 512MB machine

Screen Shot 2021-08-06 at 1 10 24 PM

victor-ono avatar Aug 06 '21 17:08 victor-ono

same issue, any updates?

yorickshan avatar Feb 22 '22 02:02 yorickshan

I have experienced the same problem with 5.2. The pm2 process takes up about 2.6G memory while the process it controls takes much lower memory 15 1 root S 2645m 34% 1 0% PM2 v5.2.0: God Daemon (/root/.pm2). It has lasted for more than 2 days till now.

The following is the output of pm2 report,

--- PM2 report ----------------------------------------------------------------
Date                 : Fri May 13 2022 14:33:40 GMT+0800 (China Standard Time)
===============================================================================
--- Daemon -------------------------------------------------
pm2d version         : 5.2.0
node version         : 14.18.1
node path            : not found
argv                 : /usr/bin/node,/usr/lib/node_modules/pm2/lib/Daemon.js
argv0                : node
user                 : undefined
uid                  : 0
gid                  : 0
uptime               : 37450min
===============================================================================
--- CLI ----------------------------------------------------
local pm2            : 5.2.0
node version         : 14.18.1
node path            : /usr/bin/pm2
argv                 : /usr/bin/node,/usr/bin/pm2,report
argv0                : node
user                 : undefined
uid                  : 0
gid                  : 0
===============================================================================
--- System info --------------------------------------------
arch                 : x64
platform             : linux
type                 : Linux
cpus                 : Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
cpus nb              : 4
freemem              : 227586048
totalmem             : 8201084928
home                 : /root
===============================================================================
--- PM2 list -----------------------------------------------
┌─────┬──────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id  │ name             │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├─────┼──────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 1   │ ws_redis         │ default     │ 2.0.0   │ fork    │ 9154     │ 24D    │ 5    │ online    │ 0.3%     │ 49.8mb   │ root     │ disabled │
└─────┴──────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
Module
┌────┬──────────────────────────────┬───────────────┬──────────┬──────────┬──────┬──────────┬──────────┬──────────┐
│ id │ module                       │ version       │ pid      │ status   │ ↺    │ cpu      │ mem      │ user     │
├────┼──────────────────────────────┼───────────────┼──────────┼──────────┼──────┼──────────┼──────────┼──────────┤
│ 0  │ pm2-logrotate                │ 2.7.0         │ 40       │ online   │ 1    │ 0.2%     │ 27.3mb   │ root     │
└────┴──────────────────────────────┴───────────────┴──────────┴──────────┴──────┴──────────┴──────────┴──────────┘
===============================================================================
--- Daemon logs --------------------------------------------
/root/.pm2/pm2.log last 20 lines:
PM2        | 2022-04-19T10:34:07: PM2 log: Stopping app:ws_redis id:1
PM2        | 2022-04-19T10:34:07: PM2 log: App [ws_redis:1] exited with code [0] via signal [SIGINT]
PM2        | 2022-04-19T10:34:07: PM2 log: pid=31052 msg=process killed
PM2        | 2022-04-19T10:34:07: PM2 log: App [ws_redis:1] starting in -fork mode-
PM2        | 2022-04-19T10:34:07: PM2 log: App [ws_redis:1] online
PM2        | 2022-04-19T10:38:26: PM2 log: Stopping app:ws_redis id:1
PM2        | 2022-04-19T10:38:26: PM2 log: App [ws_redis:1] exited with code [0] via signal [SIGINT]
PM2        | 2022-04-19T10:38:26: PM2 log: pid=31208 msg=process killed
PM2        | 2022-04-19T10:38:26: PM2 log: App [ws_redis:1] starting in -fork mode-
PM2        | 2022-04-19T10:38:26: PM2 log: App [ws_redis:1] online
PM2        | 2022-04-19T10:52:09: PM2 log: Stopping app:ws_redis id:1
PM2        | 2022-04-19T10:52:09: PM2 log: App [ws_redis:1] exited with code [0] via signal [SIGINT]
PM2        | 2022-04-19T10:52:09: PM2 log: pid=31497 msg=process killed
PM2        | 2022-04-19T10:52:09: PM2 log: App [ws_redis:1] starting in -fork mode-
PM2        | 2022-04-19T10:52:09: PM2 log: App [ws_redis:1] online
PM2        | 2022-04-19T13:22:44: PM2 log: Stopping app:ws_redis id:1
PM2        | 2022-04-19T13:22:44: PM2 log: App [ws_redis:1] exited with code [0] via signal [SIGINT]
PM2        | 2022-04-19T13:22:44: PM2 log: pid=32384 msg=process killed
PM2        | 2022-04-19T13:22:44: PM2 log: App [ws_redis:1] starting in -fork mode-
PM2        | 2022-04-19T13:22:44: PM2 log: App [ws_redis:1] online

qiulang avatar May 13 '22 06:05 qiulang

The CPU usage is almost always less than 1%, that is why I suspect there is some memory leak somewhere

  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   15     1 root     S    2645m  34%   0   0% PM2 v5.2.0: God Daemon (/root/.pm2)
 9154    15 root     S     312m   4%   2   0% node /home/express/build/index.js
   40    15 root     S     279m   4%   0   0% node /root/.pm2/modules/pm2-logrotate/node_modules/pm2-log
 9175  8997 root     S     261m   3%   2   0% node /usr/bin/pm2 logs
32405 31926 root     T     261m   3%   2   0% node /usr/bin/pm2 logs
31584 31569 root     T     261m   3%   3   0% node /usr/bin/pm2 log 1
  497 31926 root     T     261m   3%   1   0% node /usr/bin/pm2 log 1
  639 31926 root     T     261m   3%   0   0% node /usr/bin/pm2 log 1
  953 31926 root     T     260m   3%   3   0% node /usr/bin/pm2 log 1

qiulang avatar May 13 '22 06:05 qiulang

@qiulang This is not problem with PM2, changing the memory allocator to jemalloc will fix the memory issue.

TheAndroidGuy avatar May 13 '22 10:05 TheAndroidGuy

No I don't think so and https://github.com/nodejs/node/issues/21973 was closed

qiulang avatar May 15 '22 12:05 qiulang

I'm still having the same issue.

victor-ono avatar May 31 '22 18:05 victor-ono

I am facing the same issue - when I run the my node application with pm2 it takes huge amount of memory and crashes but when I simply just run "node app.js" then it run without any memory issue and crash.

sarojmhrzn avatar Jun 03 '22 06:06 sarojmhrzn

@sarojmhrzn which node version did you use? Now I use node 16.4 to see if this can fix the problem.

qiulang avatar Jun 06 '22 01:06 qiulang

i'm still having the same issue. i used v5.2.0

EHyang avatar Jul 01 '22 03:07 EHyang

same here. pm2 v5.2.0 and seems its not my code because when running with node file.js --inspect it stays at about 7mb memory looking at the heap snapshot, while with pm2 it says 47.2mb and keeps going up over time.

rodrigograca31 avatar Aug 03 '22 16:08 rodrigograca31

also I had 2 similar pieces of code and one didnt have thise memory leak problem.... then I noticed that one was using 4.x and updated it to 5.x.... and now in both the memory keeps increasing over time every time I run pm2 list

rodrigograca31 avatar Aug 03 '22 23:08 rodrigograca31

The daemon memory leak is insane, on a 256GB RAM VPS, pm2 is managing about 50 puppeteer + chrome isntances and yet pm2 Daemon is taking up 150GB+ of RAM!

HOW can I run the daemon with --inspect?? I've fixed memory leaks before but I just need to inspect the process in order to make some headway on this stupid memory leak.

pm2 report
--- PM2 report ----------------------------------------------------------------
Date                 : Fri Dec 16 2022 19:17:07 GMT+0800 (Singapore Standard Time)
===============================================================================
--- Daemon -------------------------------------------------
pm2d version         : 5.2.2
node version         : 18.12.1
node path            : /home/admin/orch/node_modules/.bin/pm2
argv                 : /usr/bin/node,/home/admin/orch/node_modules/pm2/lib/Daemon.js
argv0                : node
user                 : admin
uid                  : 1001
gid                  : 1001
uptime               : 4min
===============================================================================
--- CLI ----------------------------------------------------
local pm2            : 5.2.2
node version         : 18.12.1
node path            : /home/admin/orch/node_modules/.bin/pm2
argv                 : /usr/bin/node,/home/admin/orch/node_modules/.bin/pm2,report
argv0                : node
user                 : admin
uid                  : 1001
gid                  : 1001
===============================================================================
--- System info --------------------------------------------
arch                 : x64
platform             : linux
type                 : Linux
cpus                 : AMD EPYC 7282 16-Core Processor
cpus nb              : 64
freemem              : 242047713280
totalmem             : 270254182400
home                 : /home/admin
===============================================================================

Investigation

Update 1:

This is how you can turn on inspect on the daemon:

> PM2_NODE_OPTIONS='--inspect' pm2 update

Update 2:

From initial inspection, it looks like the fact that pidusage can take some time (few seconds sometimes) results in the callback(?) hanging around and hogging a lot of memory? Or the result/buffer of pidusage is leaking somewhere?

Reference:

https://github.com/soyuka/pidusage/commit/ff04d9e3775c6a454ffe987354f53b534228a26c https://github.com/soyuka/pidusage/issues/17

Update 3:

After hooking into the process on the machine with the issue, it looks like the heap is fine but rss continually increases. Apparently this is an issue with glib malloc. After switching to jemalloc the daemon RAM issue has been alleviated. But now we're hitting a new ENFILE issue:

2022-12-17T17:35:27: PM2 error: Trace: [Error: ENFILE: file table overflow, open '/home/admin/.pm2/logs/api-639abec721066650a072aded-1-out.log'] {
  errno: -23,
  code: 'ENFILE',
  syscall: 'open',
  path: '/home/admin/.pm2/logs/api-639abec721066650a072aded-1-out.log'
}
    at God.logAndGenerateError (/home/admin/orch/node_modules/pm2/lib/God/Methods.js:34:15)
    at /home/admin/orch/node_modules/pm2/lib/God/ForkMode.js:87:13
    at wrapper (/home/admin/orch/node_modules/async/internal/once.js:12:16)
    at WriteStream.next (/home/admin/orch/node_modules/async/waterfall.js:96:20)
    at WriteStream.<anonymous> (/home/admin/orch/node_modules/async/internal/onlyOnce.js:12:16)
    at Object.onceWrapper (node:events:628:26)
    at WriteStream.emit (node:events:513:28)
    at WriteStream.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)

Update 4:

The ENFILE issue was directly to me adding pidusage.clear() to the end of the pidusage callbacks. This would cause too many accesses to ps or proc files resulting in the ENFILE error. I reverted that change and the ENFILE error disappeared completely.

The solution to this issue is to let jemalloc to manage the memory for the daemon process. At first I set jemalloc as the default globally but it caused a regression on my processes managed by pm2 (https://github.com/puppeteer/puppeteer/issues/8246#issuecomment-1356463832). So make sure you only run the Daemon with jemalloc to avoid regressions in your orchestrated processes.

# as root
> apt-get install libjemalloc-dev

Then to start your pm2 daemon with jemalloc then run:

> LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so pm2 update

You can then confirm that pm2 daemon is running with jemalloc by using:

> ps -aux | grep pm2 #Use this to find the PID of the daemon
> cat /proc/PM2_DAEMON_PID/smaps | grep jemalloc

Before the daemon would balloon and take all available memory in the server (256GB RAM) and eventually crash. Now daemon is hovering around 150-300MB of RAM.

smashah avatar Dec 16 '22 11:12 smashah

tldr: Before the daemon would balloon and take all available memory in the server (256GB RAM) and eventually crash. Now daemon is hovering around 150-300MB of RAM after forcing the daemon to use jemalloc.

@rodrigograca31 can you check the above solution and report back if it works for you please. Maybe then I will make a PR to automatically detect and use jemalloc for the Daemon.

smashah avatar Dec 17 '22 21:12 smashah

I can't anymore... it fixed it self over time if you know what I mean 😂

rodrigograca31 avatar Dec 19 '22 11:12 rodrigograca31

@smashah how do i install jemalloc on amazon linux 2? facing the same memory spike issue

prithvisharma avatar Dec 30 '22 09:12 prithvisharma

@smashah how do i install jemalloc on amazon linux 2? facing the same memory spike issue

Try this sudo yum install -y jemalloc-devel

Then use

npx jemalloc-check to find the path for jemalloc.

smashah avatar Dec 30 '22 19:12 smashah

I stopped using PM2. Instead I'm using Linux native systemd without any memory or other issues.

victor-ono avatar Dec 30 '22 20:12 victor-ono

Indeed we determined that pmx (e.g. via @pm2/io) is the culprit for memory leaks. Typically our app runs at 500MB but without explicitly disabling pmx, the app grew to over 3-4GB in memory.

To fix this, simply disable pmx in your ecosystem.json file:

{
  "apps": [
    {
      "name": "your-app-name",
      "script": "app.js",
      "exec_mode": "cluster_mode",
      "wait_ready": true,
      "instances": "max",
+      "pmx": false,
      "env_production": {
        "NODE_ENV": "production"
      }
    },

Then delete and restart all via pm2 delete all and pm2 start ecosystem.json --env production.

References:

  • https://github.com/Unitech/pm2/blob/2573516e9321a78fb10474ea58c2cb487a663de6/lib/ProcessUtils.js#L5-L6
  • https://github.com/Unitech/pm2/issues/4510#issuecomment-1736359090
  • https://github.com/forwardemail/forwardemail.net/commit/748b21304ec5224d14ab5b98e7dc0e51ad96b72f

titanism avatar Sep 27 '23 16:09 titanism

Possibly related https://github.com/Unitech/pm2/issues/5216

titanism avatar Sep 30 '23 17:09 titanism

I'm running PM2 5.3.0 with Node 18.15.0 on a Raspberry Pi... and top shows

812 vic       20   0  188412  71332  33324 R 100.7   3.8   2:56.91 PM2 v5.3.0: God

After a little while, it the memory utilization takes over all available resources and the system crashes. How exactly do I fix this? It's not clear to me from reviewing this thread. I don't know what ecosystem.json is, does it apply to my configuration?

vicatcu avatar Oct 11 '23 19:10 vicatcu

@vicatcu set pmx: false option, it's clearly described here https://github.com/Unitech/pm2/issues/5145#issuecomment-1737764214. Yes if you're using PM2 then you should use an ecosystem.json file as its configuration. Please see PM2 docs for how to use this.

titanism avatar Oct 11 '23 19:10 titanism

Thanks I've been using PM2 for a long time and ecosystem.json is new to me... I usually just do pm2 start run.js --name somename and then do pm2 save and then pm2 startup when everything is running the way I want it to. So my followup questions are:

  1. Is there a solution for people that use pm2 like me, without an ecosystem.json file, or
  2. Is there a 'migration guide' to how to convert a situation like what I described to using ecosystem.json instead?

vicatcu avatar Oct 11 '23 20:10 vicatcu

Any update on this issue

rachitbucha avatar Oct 19 '23 03:10 rachitbucha

Just set pmx: false in the interim, it doesn't seem like PM2 is as actively maintained as it used to be.

titanism avatar Oct 19 '23 19:10 titanism

The problem was the incompatibility of the Node.js and PM2 versions.

Was node.js 21.6.1 pm2 5.3.1

It became node.js 18.9.0 pm2 5.3.1

And everything worked as it should

tintobrazov avatar Feb 06 '24 09:02 tintobrazov

The problem was the incompatibility of the Node.js and PM2 versions.

Was node.js 21.6.1 pm2 5.3.1

It became node.js 18.9.0 pm2 5.3.1

And everything worked as it should

it works

zyhnbyyds avatar Feb 22 '24 14:02 zyhnbyyds