cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Store-gateway: high memory allocations caused by per-tenant Prometheus registry

Open pracucci opened this issue 3 years ago • 1 comments

Describe the bug To be able to use Thanos BucketStore while supporting Cortex multi-tenancy we need to create a BucketStore for each tenant, passing a dedicated Prometheus registry to each one and then aggregate metrics from all registries.

Due to this, the Prometheus metrics collection causes high memory allocations (order of 50MB/s in a store-gateway with 7.5K tenants). Allocated memory is not retained, but still puts pressure on GC.

Screenshot 2021-01-07 at 11 35 27

In a cluster with low QPS, 95% store-gateway memory allocations are caused by metrics collecting.

pracucci avatar Jan 15 '21 17:01 pracucci

Enabling shuffle-sharding on store-gateway significantly improve this.

pracucci avatar Apr 29 '21 13:04 pracucci