pmx
pmx copied to clipboard
memory leak in PMX where Meter class retains object ExponentiallyMovingWeightedAverage and grows to substantial size (heapdump screens included)
See screenshots. This is a continued and sharp rise in use over 8 hour period. The box has consistent processing flow.
The leak appears to be from the global include and extension of var Probe = require('./Probe.js');
See heapsnapshots. This is an 8 hour period of time.
Thanks for inspecting,
https://github.com/keymetrics/pmx/blob/master/lib/utils/EWMA.js does not look to retain any reference
How do you know that it comes from EWMA?
Thanks for followup. I know that it is PMX, because Meter is the retainer object. The retainer of Meter is the system / Context. So this is occurring from the require statement and not some custom probe I am applying.
Also, when I turn off pmx, the memory leak goes away.
So, you aren't using any custom probes ? This is just pmx running on default configuration ?
Yes. No custom probes. I do have one action to take a heapdump. The other odd thing is keymetrics shows me more memory usage than the box has in total. See screenshot with 36GB
[Collaborator Edit] => removed screenshot link because it contains your keys
Thanks for the remove. Didn't know keys were in there. Let me know if I need to post again or if you were able to get. I trashed mine so will probably have to recreate. The jist was keymetrics added up to 36GB, but my box only has 30GB total.
I know you guys are swamped. I am more than happy to help or provide more testing. I love PM2. We are beginning to rely on it heavily. Just point me in the right direction.
I did more testing on this using node-inspector. I can reproduce the leak each time. I was able to resolve the leak by adding pmx to my client library and setting the below options.
require('pmx').init({ http : false, // (Default: true) errors : false, custom_probes : false, network : false, // Traffic usage monitoring ports : false // Shows which ports your app is listening on });
I bet the leak is likely on http, however, I disabled for all of them. I will update once I have more time to narrow it down.
Hey, can you tell for sure if it's http or custom_probes or errors ? (These are the only flags true by default, network and ports are false by default). I also think it's http thought but just to be sure.
And can you try with [email protected] ? I changed a bit the way how http is overloaded but I can't tell for sure if it fixes the memory leak.
Thanks again for your contribution :)
It's for sure http. I have some more node inspector dumps. I think it doesn't back off or batch enough under high load. This box I have is a message processor. Long running and a hundred thousand + messages. All outbound connections.
I tried to see where to edit but didn't have enough time. New relic has some good batching code in their node SDKs for reference.
On Tuesday, May 26, 2015, Joni Shkurti [email protected] wrote:
Hey, can you tell for sure if it's http or custom_probes or errors ? (These are the only flags true by default, network and ports are false by default). I also think it's http thought but just to be sure.
And can you try with [email protected] ? I changed a bit the way how http is overloaded but I can't tell for sure if it fixes the memory leak.
— Reply to this email directly or view it on GitHub https://github.com/keymetrics/pmx/issues/23#issuecomment-105598053.
Will give it a shot later today with new version.
On Tuesday, May 26, 2015, Patrick Hogan [email protected] wrote:
It's for sure http. I have some more node inspector dumps. I think it doesn't back off or batch enough under high load. This box I have is a message processor. Long running and a hundred thousand + messages. All outbound connections.
I tried to see where to edit but didn't have enough time. New relic has some good batching code in their node SDKs for reference.
On Tuesday, May 26, 2015, Joni Shkurti <[email protected] javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:
Hey, can you tell for sure if it's http or custom_probes or errors ? (These are the only flags true by default, network and ports are false by default). I also think it's http thought but just to be sure.
And can you try with [email protected] ? I changed a bit the way how http is overloaded but I can't tell for sure if it fixes the memory leak.
— Reply to this email directly or view it on GitHub https://github.com/keymetrics/pmx/issues/23#issuecomment-105598053.
I've been testing a bit with this sample code :
var pmx = require('pmx').init({
network: true,
ports: true
});
var http = require('http');
setInterval(function keepAlive() {
global.gc();
}, 500);
http.createServer(function(req, res) {
res.end('hello world');
}).listen(8080);
(launch it with --node-args="--expose-gc"
)
And I flooded the server with requests via ab -c 200 -n 1000000 localhost:8080/
I wasn't able to reproduce the memory leak :/ Memory usage stabilized at around ~50MB and never went above that.
Can you give me a sample code which triggers the leak please ?
Sure. It relies on some internal dependencies. Give me some time to clean it up.
Did you use node .12.x? What version of pmx did you use? I was on 0.3.11 when I ran that
Environmentally the first screenshots (7 days ago) I gave you that started this ticket were in production. I was running pm2 cluster and no pmx inside. I was able to get rid of the memory increase by setting pmx.init({http: false}) inside of my production code. Can you explain a little bit more how pm2 and pmx interact? It looks to me from the code that pm2 has default options to communicate with keymetrics if pmx is not defined and then (if) pmx is included it will send an rpc command to pm2 to modify those default options.
The second screenshots (2 days ago) I gave you were running through node-inspector so I wasn't able to run through pm2. I explicitly set options via pmx to imitate what pm2 would do.
I just want to make sure I am comparing apples to apples. Will code that is not using pm2, and just using pmx act the same as a code that is being run by pm2 (after keymetrics initialization)?
pmx is injected automatically when an app is launched in cluster mode with default options https://github.com/keymetrics/pmx/blob/master/lib/index.js#L32-L42
Did you use a framework like express, koa, sails ? Because i ran my tests with a basic http server and weren't able to make it leak.
I used [email protected] and [email protected]
I used this sample : https://github.com/jshkurti/benchmark
I did 3 heapsnapshots within 1 hour interval each.
The Meter object stayed still :
Can you give it a try with [email protected] please ? :)
And/or give us a sample code which triggers the leak.
@askhogan :)
Sorry man. I got swamped over the last few days. I will try this, this week.
Thanks :D
Installed 0.3.16
New settings
require('pmx').init({ http: true, errors: true, custom_probes: false, network: true, ports: true });
You can look at my metrics in keymetrics to see difference. With all of it off over the last few weeks memory usage has been stable at around 2.4GB.
I hope it levels out, but you can see once its on, it just goes up.
Grew from consistent 2.4GB mem usage with pmx off to now 7GB (< 200MB -> around 500MB per worker) in a few hours with it on. The only code change was the additions I noted above. I gotta shut this off for now and just use keymetrics without the http tracking turned on (still super awesome without this on BTW!!). This server processes lots and lots of messages. When the heap gets big the garbage collection starts eating a lot of CPU and I get weird latency on batches of messages.
I know you are having trouble re-creating it. Let me know how I can help.
Pause on my comments for a bit. I just realized the npm module didn't update. Really weird. Let me re-test. That was all done on .3.11
I had a local version at .3.11 and a global version inside pm2 at .3.16.... I updated the global pm2 version earlier. I now just updated the local version as well. Re-testing...
Same issue. I had to turn off http.
Thanks for reporting,
We have some questions:
- Which Node.js version do you use?
- Which Node.js modules do you use? (if we could get a copy of your dependencies it would be helpful)
- Do you use NewRelic in the same time?
- Are you doing anything uncommon with the HTTP?
If you find the time, the best would be to have small subset of your application to reproduce the leak ourselves.
Thanks! Best, Alex