racetrack icon indicating copy to clipboard operation
racetrack copied to clipboard

Prometheus cardinality explosion

Open iszulcdeepsense opened this issue 2 years ago • 4 comments

Let's make sure we're secured against exploding metrics in its cardinality. Prometheus label cardinality refers to the number of unique label value combinations in a given metric. Specifically, if Prometheus metrics have too many labels dimensions, it can cause number of all metrics values (combinations) to soar drastically, thus causing further problems like performance issues, exceeding storage limits, etc.

Prometheus TSDB storage is optimized for working with relatively low number of time series, not high cardinality.

iszulcdeepsense avatar Oct 16 '23 09:10 iszulcdeepsense

Formerly, I've run into the issue with Prometheus volume running out of storage, even though there was configured much lower retention size. Maybe that's a result of cardinality explosion.

iszulcdeepsense avatar Dec 04 '23 14:12 iszulcdeepsense

This has a section titled "Find High Cardinality Metrics" which looks to me like a good place to start, i.e. figure out where we actually are.

In fact you might even want to have a meta-dashboard and alarm to keep an eye on cardinality, using some of those PromQL queries in that page.

JosefAssadERST avatar Dec 05 '23 05:12 JosefAssadERST

Without any specific examples of where it goes wrong, it is difficult to design specific policies. However, in my locally setup kind cluster, most of the time series are related to Postgres, so that might be a place to start working on some nebulous "improvement": image

anders314159 avatar Mar 09 '24 08:03 anders314159

We might consider aggregating or outright removing some of the metrics/labels that are scraped from Postgres. On the other hand, compression might take care of most of Postgres metrics, which is why examples would be nice.

anders314159 avatar Mar 09 '24 08:03 anders314159