cratedb-prometheus-adapter
cratedb-prometheus-adapter copied to clipboard
Possible memory leak
User at https://community.cratedb.com/t/disk-space-issues-on-prometheus-integration/1848/3 reports 85% memory usage, I checked out my own long running container and it was: 35%
5fd5f000593a crate-prometheus-exporter 0.00% 5.462GiB / 15.51GiB 35.22% 4.26GB / 1.62GB 0B / 0B 12
Some other long running container was at 1.6/6GB ~25%
These last two findings are less than the users's 85% but still too much in my opinion and makes me think something is going on.
I just checked my machine and after turning it off and on its now stable at:
5fd5f000593a crate-prometheus-exporter 0.00% 43.88MiB / 15.51GiB 0.28% 72.1MB / 154MB 0B / 0B 11
Hello!
I had it running for 12 hours, and the memory usage right now is at 11.6% of system memory:
0227e05def58 cratedb-prometheus-adapter 215.22% 14.55GiB / 125.5GiB 11.60% 0B / 0B 5.13MB / 0B 53
I have the /metrics output from half an hour and now, is it useful? I can upload it here to see if it helps debugging the issue.
I have to shutdown everything to continue with my work, so I'm dumping all hoping it will help.
I stopped prometheus-server, and after a while memory usage decreased a bit but then stayed there:
0227e05def58 cratedb-prometheus-adapter 53.34% 11.24GiB / 125.5GiB 8.96% 0B / 0B 5.18MB / 0B 53
And 20+ minutes later:
0227e05def58 cratedb-prometheus-adapter 60.95% 11.19GiB / 125.5GiB 8.92% 0B / 0B 5.18MB / 0B 53
I'm also attaching the metrics on different moments:
- 14:39 - adapter-metrics-1.txt
- 15:08 - adapter-metrics-2.txt
- 15:24 - Stopped prometheus-server container - adapter-metrics-3-prometheus-server-just-stopped.txt
- 15:44 - prometheus-server still stopped - adapter-metrics-4-prometheus-server-stopped-20-minutes-ago.txt
--
Noticed that memory usage is going down, but very slowly.. maybe we need a more aggressive GC? each command run manually over some minutes span:
0227e05def58 cratedb-prometheus-adapter 56.56% 11.13GiB / 125.5GiB 8.87% 0B / 0B 5.18MB / 0B 53
0227e05def58 cratedb-prometheus-adapter 3092.44% 11.13GiB / 125.5GiB 8.87% 0B / 0B 5.18MB / 0B 53
0227e05def58 cratedb-prometheus-adapter 34.82% 11.1GiB / 125.5GiB 8.85% 0B / 0B 5.18MB / 0B 53
0227e05def58 cratedb-prometheus-adapter 52.53% 11.08GiB / 125.5GiB 8.83% 0B / 0B 5.18MB / 0B 53
And now ran in a loop to have more accurate stats:
Note that prometheus-server is stopped, nothing is connecting to the adapter, and the CPU usage is still a bit high (with some peaks, maybe GC?). I need to stop everything and move along, I'm sharing here the last metrics output before shutting it down. Hope it helps!!
Updating all dependencies and go version made it so I could no longer quickly reproduce the memleak, but let's be careful, you never know with this things.
I released 0.5.2 and will deploy it on a couple of long running machines that I have and let it be for a few days, let's see if it manifests.
All good on my end
Hi. @WalBeh reported the memory leak is still present and going strong. Thank you.
After the process is restarted and running for about 12h the memory consumption is approx.
420MB.20min.after restarting the process it consumes about30MB. Looks like - our setup - it leaks about 30MB/h.
@WalBeh suggested to limit resources, in order to work around the problem. Thanks!
MemoryAccounting=true
MemoryHigh=512M
MemoryMax=768M
CPUAccounting=true
CPUWeight=50
CPUQuota=100%
Still I would like to emphasize we need to resolve the root cause here. 🍀
@widmogrod: Thanks a stack for submitting GH-203. 💯
@widmogrod: Tomorrow morning, you will be able to use an image that includes your improvement per nightly tag.
docker pull ghcr.io/crate/cratedb-prometheus-adapter:nightly
-- https://github.com/crate/cratedb-prometheus-adapter/pkgs/container/cratedb-prometheus-adapter
@widmogrod: Tomorrow morning, you will be able to use an image that includes your improvement per
nightlytag.docker pull ghcr.io/crate/cratedb-prometheus-adapter:nightly-- https://github.com/crate/cratedb-prometheus-adapter/pkgs/container/cratedb-prometheus-adapter
I would appreciate it if someone would test if the memory leak disappeared with this change, or provide me with instructions on how to simulate it. I didn't manage to replicate it on my local environment (and on mac I had to use /_/crate instead of crate/crate:nighthly)
Hi. Maybe @goat-ssh can whip it up for probing properly?
@widmogrod: v0.5.3 has just been released, including your fix. Release artefacts are available as OCI image and standalone builds. Thanks again.