parca
parca copied to clipboard
High memory usage
I got quite high memory usage from parca (~45GB) and parca-agent (~1GB). This is on a one node setup without any additional scrape_configs. Node has just 44 pods running.
parca pprof from heap - https://share.polarsignals.com/f8b0beb/ parca-agent pprof from heap - https://share.polarsignals.com/89d4e84/
Might be related to https://github.com/parca-dev/parca/issues/283 but I did not have time to investigate.
Prometheus output:
As for Parca itself, this looks "good" as the majority of heap is used with the chunkenc.Pool
.
Just today I started working on removing the need to store cumulative values and only store the flat values (leaf values) which we can calculate the cumulative values from by simply adding flat values together.
Then, in the future, once we get to a persistent storage, we'll be able to mmap older chunks to not constantly have them in-memory (just like Prometheus).
Bear with us until then. Thanks for reporting!
For now, I'd recommend running Parca with retention of a few hours 3h, 6h, 12h, depending on what you're willing to pay for in terms of memory. Chunks older than the retention are emptied and reused for newer chunks. This should make things more predictable for now.
Update after redeploying with 3h storage retention. Memory is still increasing but at a slower rate. After 17h parca is consuming ~30GB of memory.
This may still be an issue. I’ve set storage-tsdb-retention-time=30m (the orange line) and memory consumption appears unaffected. The blue line is with 1h retention so I’d rather expected that the memory usage would be lessened for the server running with 30m. I’m running v0.7.1
The one thing that definitely still keeps increasing with no vacuuming happening is metadata. So if you have a lot of churns (maybe you have many deployments happening, etc) then this is right now ever so slightly increasing. Other than that it should be running mostly stable. At least that's the case for https://demo.parca.dev
Could you check the /metrics
endpoint of Parca see if you can observe an increase of truncated chunks over time?
The metric to look for is parca_tsdb_head_truncated_chunks_total
. Another one would be parca_tsdb_head_min_time
and seeing if that increases ever so slightly after the oldest chunks are truncated too. Let us know!
This is a graph from this morning
We've also observed some very large spikes in memory usage from the Parca agents themselves.
Hey all. We've landed the new and improved storage in main
. You can try a latest image (like ghcr.io/parca-dev/parca:main-ddad21cc
), and enable the new storage with --storage=columnstore
. Would love to see how it performs for you! Note that currently the column-store will accumulate 512mb of memory and then throw away all data and start over. The amount can be controlled with --storage-active-memory=536870912
. We're still working on persisting the data.
@brancz just weighing in on this with what I have observed running v0.12.0
as it may be helpful.
Heres the relevant config:
containers:
- resources:
limits:
cpu: '8'
memory: 32Gi
requests:
cpu: '2'
memory: 8Gi
image: 'quay.io/observatorium/parca:v0.12.0'
args:
- /parca
- '--config-path=/var/parca/parca.yaml'
- '--log-level=info'
- '--storage-active-memory=20000000000'
And a graph of the memory usage:

It didn't OOM (although eerily close to the limit 🤔 ) so I expect there was some rotation when storage-active-memory
was reached.
I just wanted to clarify if the behaviour looks ok up until that point though? Memory usage seems to steadily increase and we are scraping ~20 static (no churn) Pods as targets.
Churn doesn't make a difference for Parca, but that's beside the point. --storage-active-memory
accounts for the buffers held by FrostDB, plus some metadata around that, but that's negligible. What that means is that you always have to also account for the badger-based metastore on top of that. And then all of that is garbage collected by the Go runtime, meaning a fair amount of garbage needs to be on the heap for GC to run and free it, but it will also be freed by the Go runtime if another process wants to allocate it. I realize that doesn't necessarily make it a lot easier to reason about, but it puts things into perspective and given all of that, that doesn't seem entirely unreasonable (at least in terms of the pattern).
All of that said, we do know about a bunch of low-hanging fruit that can be easily optimized in the way how we store data physically (in memory and on disk): https://github.com/parca-dev/parca/issues/1309.
Does that somewhat answer the question?
Actually, one more thing, could you share a memory profile? Then we could see if my suspicion is true in terms of where the memory is spent.
That makes sense, and answers the question, thanks for providing the details.
There was nothing that looked particularly off to me either but since I had the data from the latest release and had been following this issue I felt it would be useful to provide it.
I'll follow up with the profile.
Hi @brancz . I just downloaded Parca for the first time from the release page and ran it with the default configuration file from the documentation without any additional switches (which implies the default of 512MB for FrostDB), and I'm getting way more memory usage than that reported by Parca itself while monitoring itself.
EDIT: I'm using only one parca-agent (using a systemd target) running on the localhost.
What do you guys think? I've attached the screenshot.
If I let it run for another few minutes..... it'll just continue to grow unbounded until its OOM-killed.
https://pprof.me/96f41cb
Can see that the most of memory is being utilized by splitRowsByGranule
and not by the actual storage blocks.
We get a little over 6GiB before the block is rotated and memory usage drops
https://github.com/polarsignals/frostdb/pull/189 should fix the high memory usage
Closing this issue as https://github.com/parca-dev/parca/pull/1682 has merged.
@thorfour I don't believe this is fully resolved. I rebuilt parca from master and while it grows a little bit slower, it's still getting OOM-killed eventually.
I let it grow for a while, and it looks like the biggest consumer of memory is in the scrapeLoop? And then further down below that, most of that still leaking in FrostDB.
Here is a screenshot:
NOTE: This is on a vanilla Ubuntu 22.04 system, running a single parca-agent on the localhost. Pretty basic setup.
Ping?
Thanks! Sorry I missed the previous message, I'll dig into this more.
Do you have a profile you can share with this high memory usage? Can upload to pprof.me
Also, could you send the commit sha you're running on, and the flags you're using to run it?
Closing this issue as I believe much of the memory pressure has been resolved by various improvements since this issue. Please re-open with additional information if you're able to recreate on the latest versions of Parca