Prometheus cardinality explosion
Let's make sure we're secured against exploding metrics in its cardinality. Prometheus label cardinality refers to the number of unique label value combinations in a given metric. Specifically, if Prometheus metrics have too many labels dimensions, it can cause number of all metrics values (combinations) to soar drastically, thus causing further problems like performance issues, exceeding storage limits, etc.
Prometheus TSDB storage is optimized for working with relatively low number of time series, not high cardinality.
Formerly, I've run into the issue with Prometheus volume running out of storage, even though there was configured much lower retention size. Maybe that's a result of cardinality explosion.
This has a section titled "Find High Cardinality Metrics" which looks to me like a good place to start, i.e. figure out where we actually are.
In fact you might even want to have a meta-dashboard and alarm to keep an eye on cardinality, using some of those PromQL queries in that page.
Without any specific examples of where it goes wrong, it is difficult to design specific policies.
However, in my locally setup kind cluster, most of the time series are related to Postgres, so that might be a place to start working on some nebulous "improvement":
We might consider aggregating or outright removing some of the metrics/labels that are scraped from Postgres. On the other hand, compression might take care of most of Postgres metrics, which is why examples would be nice.