cloud-debug-nodejs
cloud-debug-nodejs copied to clipboard
Potential Memory Leak
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
- Search the issues already opened: https://github.com/googleapis/cloud-debug-nodejs/issues
- Search the issues on our "catch-all" repository: https://github.com/googleapis/google-cloud-node
- Search StackOverflow: http://stackoverflow.com/questions/tagged/google-cloud-platform+node.js
If you are still having issues, please be sure to include as much information as possible:
Environment details
- OS: Google App Engine
- Node.js version: 12.14.x
- npm version: 6.13
@google-cloud/debug-agentversion: 4.2.1
Steps to reproduce
We found that the package cause a memory leak. This image shows our application memory with the package and without it.

We are running on Google App Engine with 4 nodes. Any help is appreciated.
Thanks!
It looks like the issue node #28420 is impacting the version of node you are using. I suspect it could be the root cause.
@DominicKramer node version upgrade didn't resolve the issue.
We are observing a similar issue on the upgraded node version. The memory usage gradually climbs until OOM kill when debug agent is enabled. After it was disabled the memory usage is stable.

Running on GKE (Google Kubernetes Engine) pod
- OS: official node docker image
- Node.js version:12.16.0
- npm version:6.13.4
@google-cloud/debug-agentversion:4.2.2
Thank you for the update. Have you seen similar problems on GCP services other than GKE?
That's the only GCP service we use. We have all our services running under kubernetes
Ours is on AppEngine
@legopin and @tbadlov thanks for the update.
if anyone has a repro gist or repository we can run that leaks this way that would help us get to the bottom off this a little more quickly. It helps to confirm that we're solving the problem you're hitting.
@soldair Here is a reproducible code snippet: This was run in GKE as a single pod, continued to observe memory leak even when no requests were received.

index.js
require("@google-cloud/debug-agent").start({
projectId: 'ID',
allowExpressions: true,
serviceContext: {
service: 'debug-leak',
},
});
const Koa = require("koa");
const app = new Koa();
// response
app.use(ctx => {
ctx.body = "Hello Koa";
});
app.listen(3000, ()=>{console.log('server started')});
Dockerfile
FROM node:12.16.0
COPY package.json package-lock.json /tmp/
RUN mkdir /opt/app \
&& mv /tmp/package.json /tmp/package-lock.json /opt/app \
&& cd /opt/app && npm ci
WORKDIR /opt/app
COPY src /opt/app/src
EXPOSE 3000
ENV NODE_ENV production
CMD ["node", "src"]
Hello, are there any updates on this issue? Were you able to reproduce the issue?
We are experiencing this as well and had to shut down the debugger. The image below shows the same app deployed to two clusters, one with the debugger on and one with the debugger off.

The debugger is a big piece of our workflow so this is pretty dramatic :)
I just tested the same code with 12.16 and same results. The library in the current state is not usable. How is this not a p0 on a LTS version of node!
@kilianc If I understand correctly this issue isn't happening on 10.x?
@bcoe can't confirm, we're running on 12.x it's very easy to repro with the example provided (just changing the FROM to 10.x.
How can I help?
Are there any updates on the issue? We ae running on GCE:
- node v14.1.0
- @google-cloud/debug-agent: ^4.2.2
Memory leaks occur when we have any logpoints. In 2 hours our VM exceeds it's memory having only one logpoint and die. Only stop and start can help (SSH doesn't work also)
I know that this is reported well enough. Just wanted to pitch in that we are experiencing the same in our GAE Flex in all services that utilise this library. We've tested this by removing npm libraries one-by-one and stress testing our application to see the memory consumption. Only the versions with the debug-agent are affected with this leak.
I think it is also important to note that the less log lines there are the slower the memory leak. So this seems to be somehow related to logging.
I would even say that this is a critical issue as it makes it impossible to utilise this library.
It seems our services on GKE that use this library are also affected by this issue. It seems to be a problem on any node version from 12.11.0 onwards with debug agent 5.1.0
Deployed a change to one of our services today. That change was to drop @google-cloud/debug-agent. We went from erroring with an OOM after every ~10-20 requests to having done a million requests without any OOM. Definitely an issue with this package or one of its deps.
I'm looking into this issue. I've been able to reproduce a memory leak on GKE using the snippet from above, though I'm not convinced that I've covered the problematic codepath as the leak is slow (~5MB/hour) and seems to plateau eventually.
Unfortunately, I am currently unable to track down the actual memory leak when running it locally with node 12.18.3 and using the memory allocation timeline. I'll continue to investigate.
@mctavish Our particular app loads and executes JS dynamically on each request. I think that consumed memory very quickly. You might try recreating that dynamic load of additional JS to increase the memory burden and demonstrate the leak more quickly.
@mctavish the charts that I posted above are from a staging service without traffic (<10rps)
I can confirm that we are seeing this exact issue in production. We were able to turn off the cloud debugger and resolve the memory leak we detected. Based on our heapdumps we suspect the memory leak has something to do with how access_tokens are stored.
I can also confirm that this caused a memory leak in our production environment. We missed it in staging because of how often we deploy. This library is effectively unusable on Node 12.x (< 12.16).
I was able to reproduce the memory leak on both GAE standard and locally deployed application. The key to reproduce is to have an active logpoint that is set at a place which is reached periodically. According to the local application's heap usage, every time the logpoint is triggered, the memory usage will increase around 300KB. That is, if each request triggers one logpoint, then the memory usage will quickly reach 256 MB (default memory limit of GAE standard instance) before serving 1000 requests.
Further investigation shows that, the "memory leak" is not caused by the code in cloud debugger, but rather it is caused by the fact that the v8 inspector session stored every context for each breakpoint hit (cloud debugger uses v8 inspector breakpoint to realize the snapshot/logpoint).
This is verified by my local environment where an application would still suffer from the same "memory leak" issue when it does not have cloud debugger, but has v8 inspector setting a breakpoint at a busy place.
Storing each of the breakpoint hit context might be a feature provided by v8 inspector, but as for the usage of cloud debugger, this is not what was needed, because cloud debugger does not need the context as soon as after catching the paused event and taking the snapshot/evaluating the logpoint. So to fix the memory leak, cloud debugger should clean up those contexts. However, according to the v8 inspector documentation, there seems to be no quick way to achieve the goal.
One possible way for cloud debugger to fix this problem is to monitor the number of breakpoint hits, when reaching certain threshold, it closes the current session to v8 inspector, re-connect and re-set all the existing snapshots/logpoints. (I have verified it locally that closing the session can clear all the breakpoint-hit contexts thus releasing the memory usage, but will also clear all the previously set breakpoints). The only concern here is the performance hit of such a reset (need to measure).
@Louis-Ye in my case I had no active logpoints nor breakpoints, unless they were not visible in the UI and cached in the system.
Thanks for looking into this.
A bug on the V8 Engine is opened about the related issue: https://bugs.chromium.org/p/chromium/issues/detail?id=1208482
@Louis-Ye in my case I had no active logpoints nor breakpoints, unless they were not visible in the UI and cached in the system.
Thanks for looking into this.
Just to echo the above comment, we're also getting this issue deployed to Cloud Run using Node 14 with no log/break points every being set. Removed the debug agent and memory usage constant.
Hi @kilianc and @rhodgkins, I wasn't able to reproduce the memory leak without setting active breakpoints. The fix (for fixing memory leak when having active breakpoints) was just merged in, would you be able to try the latest fix to see if it helps to solve your problem? If not, please re-open this issue. Also it would be much appreciated if more environment information can be provided (e.g., node.js version/v8 version, debugger agent version, deployment environment, the way you measure memory usage/detect memory leak, the speed/rate of the memory leak, etc.)
Thanks @Louis-Ye! When this makes it into a release, I can also try it out on our service that was exhibiting this problem.