mimir
mimir copied to clipboard
Subdivide some per-tenant metrics by another label
Is your feature request related to a problem? Please describe.
Suppose someone is using Mimir with a broad set of data sources, but feeding into one tenant. They would like some detail on number of active series, samples accepted, discarded, etc.
Describe the solution you'd like
Mimir already has a concept of "cluster", used by the "HA-tracker". This typically maps to a "datacenter", "region", something like that.
We could optionally expose per-tenant metrics at this cluster level, to give finer-grained information to administrators. This would be more expensive internally, as we would be tracking more information. Some metrics, e.g. ones gathered from TSDB internals, cannot be split out by cluster without a lot more work.
Describe alternatives you've considered
Nothing so far.
I renamed this issue because the basic idea generalises to splitting by any label - for instance people might have a team or a domain label that can tie back to where the data comes from. cluster also works but the feature doesn't need to be coupled to HA-tracker.
@bboreham - To what degree does the 'usage groups' feature solve this problem? It only gives you active series count subdivided by some user-specified set of label matchers (not discarded samples or dpm), but I'm wondering if that's enough.
Also, just for my understanding, do we know why the user hasn't or can't or doesn't want to simply choose to subdivide these different slices of their estate into different tenants?
does ‘usage groups' feature solve this problem?
Maybe, but it gets more expensive to compute the more splits you have, to the point where I wouldn’t want to use it for more than 20.
(I believe we have 100-ish live, and people asking for thousands)
does ‘usage groups' feature solve this problem?
Maybe, but it gets more expensive to compute the more splits you have, to the point where I wouldn’t want to use it for more than 20.
(I believe we have 100-ish live, and people asking for thousands)
Would this proposed enhancement be fine with thousands? Or the other way around, could we make the usage groups feature be able to handle that?
Because this enhancement suggests providing more datapoints; active series, samples accepted, discarded, etc. and the usage group tracker just considers active series, wouldn't this new solution in the end turn out to be more expensive?
We'll be hopping on the phone with one potential user of this feature in the next week or so. We are still hoping/wanting them to keep it in the tens, not hundreds.
@bboreham can you chime in on whether you think PR 2702 and the methodology taken there is inherently better at handling high number of groups than the current methodology used to compute usage groups? Because otherwise, we'll need to put the same caveats around the number of groups that the metrics in PR 2702 would be calculated for so that we can control the cost of this feature (as @RutgerKe points out)
PR 2702 and the methodology taken there is inherently better at handling high number of groups than the current methodology used to compute usage groups
Yes, usage groups are implemented as a linear search "does it match this one?" each time a series is added, whereas a label can be a map lookup.
so that we can control the cost of this feature
This is still a consideration; each different group creates more data which costs us more to process. There is also a DoS consideration, e.g. if you sent 100 million values then that would probably break the reporting process.