cratedb-prometheus-adapter Possible memory leak

trafficstars

User at https://community.cratedb.com/t/disk-space-issues-on-prometheus-integration/1848/3 reports 85% memory usage, I checked out my own long running container and it was: 35%

5fd5f000593a   crate-prometheus-exporter   0.00%  5.462GiB / 15.51GiB   35.22%  4.26GB / 1.62GB   0B / 0B  12

Some other long running container was at 1.6/6GB ~25%

These last two findings are less than the users's 85% but still too much in my opinion and makes me think something is going on.

Sep 17 '24 16:09 surister

I just checked my machine and after turning it off and on its now stable at:

5fd5f000593a   crate-prometheus-exporter  0.00%  43.88MiB / 15.51GiB   0.28%  72.1MB / 154MB  0B / 0B  11

Sep 17 '24 16:09 surister

Hello!

I had it running for 12 hours, and the memory usage right now is at 11.6% of system memory:

0227e05def58   cratedb-prometheus-adapter   215.22%   14.55GiB / 125.5GiB   11.60%    0B / 0B   5.13MB / 0B   53

I have the /metrics output from half an hour and now, is it useful? I can upload it here to see if it helps debugging the issue.

Sep 25 '24 15:09 cyberplant

I have to shutdown everything to continue with my work, so I'm dumping all hoping it will help.

I stopped prometheus-server, and after a while memory usage decreased a bit but then stayed there:

0227e05def58   cratedb-prometheus-adapter   53.34%    11.24GiB / 125.5GiB   8.96%     0B / 0B   5.18MB / 0B   53

And 20+ minutes later:

0227e05def58   cratedb-prometheus-adapter   60.95%    11.19GiB / 125.5GiB   8.92%     0B / 0B   5.18MB / 0B   53

I'm also attaching the metrics on different moments:

14:39 - adapter-metrics-1.txt
15:08 - adapter-metrics-2.txt
15:24 - Stopped prometheus-server container - adapter-metrics-3-prometheus-server-just-stopped.txt
15:44 - prometheus-server still stopped - adapter-metrics-4-prometheus-server-stopped-20-minutes-ago.txt

--

Noticed that memory usage is going down, but very slowly.. maybe we need a more aggressive GC? each command run manually over some minutes span:

0227e05def58   cratedb-prometheus-adapter   56.56%    11.13GiB / 125.5GiB   8.87%     0B / 0B   5.18MB / 0B   53
0227e05def58   cratedb-prometheus-adapter   3092.44%   11.13GiB / 125.5GiB   8.87%     0B / 0B   5.18MB / 0B   53
0227e05def58   cratedb-prometheus-adapter   34.82%    11.1GiB / 125.5GiB   8.85%     0B / 0B   5.18MB / 0B   53
0227e05def58   cratedb-prometheus-adapter   52.53%    11.08GiB / 125.5GiB   8.83%     0B / 0B   5.18MB / 0B   53

And now ran in a loop to have more accurate stats:

Sep 25 '24 16:09 cyberplant

Note that prometheus-server is stopped, nothing is connecting to the adapter, and the CPU usage is still a bit high (with some peaks, maybe GC?). I need to stop everything and move along, I'm sharing here the last metrics output before shutting it down. Hope it helps!!

adapter-metrics-5-before-shutdown.txt

Sep 25 '24 16:09 cyberplant

Updating all dependencies and go version made it so I could no longer quickly reproduce the memleak, but let's be careful, you never know with this things.

I released 0.5.2 and will deploy it on a couple of long running machines that I have and let it be for a few days, let's see if it manifests.

Nov 11 '24 15:11 surister

All good on my end

Jan 16 '25 11:01 surister

Hi. @WalBeh reported the memory leak is still present and going strong. Thank you.

After the process is restarted and running for about 12h the memory consumption is approx. 420MB. 20min. after restarting the process it consumes about 30MB. Looks like - our setup - it leaks about 30MB/h.

Feb 16 '25 09:02 amotl

@WalBeh suggested to limit resources, in order to work around the problem. Thanks!

MemoryAccounting=true
MemoryHigh=512M
MemoryMax=768M
CPUAccounting=true
CPUWeight=50
CPUQuota=100%

Still I would like to emphasize we need to resolve the root cause here. 🍀

Feb 16 '25 10:02 amotl

@widmogrod: Thanks a stack for submitting GH-203. 💯

Feb 26 '25 18:02 amotl

@widmogrod: Tomorrow morning, you will be able to use an image that includes your improvement per nightly tag.

docker pull ghcr.io/crate/cratedb-prometheus-adapter:nightly

-- https://github.com/crate/cratedb-prometheus-adapter/pkgs/container/cratedb-prometheus-adapter

Feb 27 '25 00:02 amotl

@widmogrod: Tomorrow morning, you will be able to use an image that includes your improvement per nightly tag.
docker pull ghcr.io/crate/cratedb-prometheus-adapter:nightly
-- https://github.com/crate/cratedb-prometheus-adapter/pkgs/container/cratedb-prometheus-adapter

I would appreciate it if someone would test if the memory leak disappeared with this change, or provide me with instructions on how to simulate it. I didn't manage to replicate it on my local environment (and on mac I had to use /_/crate instead of crate/crate:nighthly)

Feb 27 '25 09:02 widmogrod

Hi. Maybe @goat-ssh can whip it up for probing properly?

Feb 27 '25 09:02 amotl

@widmogrod: v0.5.3 has just been released, including your fix. Release artefacts are available as OCI image and standalone builds. Thanks again.

Feb 27 '25 22:02 amotl

cratedb-prometheus-adapter cratedb-prometheus-adapter copied to clipboard

Possible memory leak

cratedb-prometheus-adapter
cratedb-prometheus-adapter copied to clipboard