cortex Label queries are not split by day

Describe the bug We don't split label-names and label-values by day so if someone opens a large dashboard, they'll hit the time-range limits per query. For example, we set the max time-range for a query to be 32d, so people can do rate([32d]) but they can do long dashboards because the queries will be split into 24h time-ranges.

We don't do the splitting with labels lookups, so opening large dashboards leads to this:

This is only for blocks storage engine and only when -querier.query-store-for-labels-enabled is set. Related to #3520

Dec 04 '20 06:12 gouthamve

How do you propose to solve this? Some thoughts:

We could not check that limit for label names/values queries, but we could kill the system with such queries over large time ranges
Is it really useful seeing label names/values for such large time ranges? What if we "clamp" the max time range to a configurable limit so that, when limit is hit, the query still succeed but the actual queried time range is not larger than the limit?

Dec 04 '20 08:12 pracucci

We can parallelise these queries just like we do query_range?

Is it really useful seeing label names/values for such large time ranges?

I think so, one case is for example the label value is only "A" for 6months and it changes to "B" for 6months. If people open a 1yr dashboard they'd like to see both. I know this is an esoteric case, but we should ideally have limits on #labels returned than silently querying a range that is smaller.

Dec 04 '20 09:12 gouthamve

we should ideally have limits on #labels returned than silently querying a range that is smaller.

Right, agree on this. Clamping without notice is bad UX.

We can parallelise these queries just like we do query_range?

Yes, we could. The only difference I see compared to query_range is that, in the query-frontend, we would have to merge the results (removing duplicates) instead of concat them. I'm wondering CPU and memory wise how impactful this could be on large results sets.

Getting back to limits tho, I think we should have some limits (eg. number of returned values?). Looks a bit risky not having any limit at all on label names/values.

Dec 04 '20 09:12 pracucci