mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Subdivide some per-tenant metrics by another label

Open bboreham opened this issue 3 years ago • 3 comments

Is your feature request related to a problem? Please describe.

Suppose someone is using Mimir with a broad set of data sources, but feeding into one tenant. They would like some detail on number of active series, samples accepted, discarded, etc.

Describe the solution you'd like

Mimir already has a concept of "cluster", used by the "HA-tracker". This typically maps to a "datacenter", "region", something like that.

We could optionally expose per-tenant metrics at this cluster level, to give finer-grained information to administrators. This would be more expensive internally, as we would be tracking more information. Some metrics, e.g. ones gathered from TSDB internals, cannot be split out by cluster without a lot more work.

Describe alternatives you've considered

Nothing so far.

bboreham avatar Jul 14 '22 14:07 bboreham

I renamed this issue because the basic idea generalises to splitting by any label - for instance people might have a team or a domain label that can tie back to where the data comes from. cluster also works but the feature doesn't need to be coupled to HA-tracker.

bboreham avatar Aug 08 '22 14:08 bboreham

@bboreham - To what degree does the 'usage groups' feature solve this problem? It only gives you active series count subdivided by some user-specified set of label matchers (not discarded samples or dpm), but I'm wondering if that's enough.

09jvilla avatar Aug 09 '22 15:08 09jvilla

Also, just for my understanding, do we know why the user hasn't or can't or doesn't want to simply choose to subdivide these different slices of their estate into different tenants?

09jvilla avatar Aug 09 '22 15:08 09jvilla

does ‘usage groups' feature solve this problem?

Maybe, but it gets more expensive to compute the more splits you have, to the point where I wouldn’t want to use it for more than 20.

(I believe we have 100-ish live, and people asking for thousands)

bboreham avatar Aug 29 '22 12:08 bboreham

does ‘usage groups' feature solve this problem?

Maybe, but it gets more expensive to compute the more splits you have, to the point where I wouldn’t want to use it for more than 20.

(I believe we have 100-ish live, and people asking for thousands)

Would this proposed enhancement be fine with thousands? Or the other way around, could we make the usage groups feature be able to handle that?

Because this enhancement suggests providing more datapoints; active series, samples accepted, discarded, etc. and the usage group tracker just considers active series, wouldn't this new solution in the end turn out to be more expensive?

RutgerKe avatar Aug 29 '22 12:08 RutgerKe

We'll be hopping on the phone with one potential user of this feature in the next week or so. We are still hoping/wanting them to keep it in the tens, not hundreds.

@bboreham can you chime in on whether you think PR 2702 and the methodology taken there is inherently better at handling high number of groups than the current methodology used to compute usage groups? Because otherwise, we'll need to put the same caveats around the number of groups that the metrics in PR 2702 would be calculated for so that we can control the cost of this feature (as @RutgerKe points out)

09jvilla avatar Sep 06 '22 02:09 09jvilla

PR 2702 and the methodology taken there is inherently better at handling high number of groups than the current methodology used to compute usage groups

Yes, usage groups are implemented as a linear search "does it match this one?" each time a series is added, whereas a label can be a map lookup.

so that we can control the cost of this feature

This is still a consideration; each different group creates more data which costs us more to process. There is also a DoS consideration, e.g. if you sent 100 million values then that would probably break the reporting process.

bboreham avatar Dec 07 '22 09:12 bboreham