thanos
thanos copied to clipboard
store: Thanos consumes 40G at startup
Hello,
I'm having some uncontrolled memory consumption with thanos store.
What happened:
At start up, there is a peak of memory of ~40GiB that later decreases over time to ~26GiB. However, depending on the load, it reached back the peak value and even beyond (which means OEMKill).

even when i'm using a memcached precisely to avoid this situation.
I currently have more that 145k blocks in my s3 storage and more than 80 prometheus (+sidecars).
What I expected:
As per the use of a cache I expected the memory to stay low.
At least it seems the cache is used

current configuration
thanos store \
--data-dir="/var/thanos/store" \
--objstore.config-file="/etc/thanos/bucket.yml" \
--http-address="0.0.0.0:10902" \
--grpc-address="0.0.0.0:10901" \
--log.format="json" \
--log.level="info" \
--store.index-header-posting-offsets-in-mem-sampling=50 \
--index-cache.config-file="/etc/thanos/thanos-store-cache-config.yml"
this is my cache configuration
type: MEMCACHED
config:
addresses: ["memcached:11211"]
timeout: 3s
max_idle_connections: 200
max_async_concurrency: 20
max_async_buffer_size: 10000000
max_item_size: 300MiB
max_get_multi_concurrency: 5
max_get_multi_batch_size: 20
dns_provider_update_interval: 10s
Environment:
- thanos 0.25.1
- prometheus 2.28.1
What have I tested so far?
1 - sharding by date using flags
I created several stores for 3 months periods using this pattern
thanos store \
...
--max-time=-12w \
--min-time=-24w \
but, all stores ended-up consuming the same amount of memory.
2 - changing the compact/level
I basically followed this issue https://github.com/thanos-io/thanos/issues/325
Anything else
I don't know if this information can be useful but the amt of samples changes greatly depending on my prometheus

Do you have persistent storage on your Thanos Store pods (I assume k8s here)? The RAM usage probably comes from building binary index headers. Maybe there is some opportunity here to use sync.Pool to have a constant RAM usage. With persistent storage, you wouldn't have to rebuild them each time. Could you upload a profile of the memory usage of Thanos Store just after the start of the process?
Hello, thanks a lot for your answer!
We are actually using docker swarm with persistent volumes. I do not think there is a problem at this level because we have hundreds of services (not related to prometheus+thanos) running without any issue.
As per the memory, here we have the memory for the store using the bucket with 145k blocks after a reboot
and here we have a similar behavior of one of the stores sharded on three months (using min-time and max-time) after a reboot
Note:
- This might be unrelated but the
thanos compactis down because there are some overlaps.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.