cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Label queries are not split by day

Open gouthamve opened this issue 4 years ago • 12 comments

Describe the bug We don't split label-names and label-values by day so if someone opens a large dashboard, they'll hit the time-range limits per query. For example, we set the max time-range for a query to be 32d, so people can do rate([32d]) but they can do long dashboards because the queries will be split into 24h time-ranges.

We don't do the splitting with labels lookups, so opening large dashboards leads to this: image

This is only for blocks storage engine and only when -querier.query-store-for-labels-enabled is set. Related to #3520

gouthamve avatar Dec 04 '20 06:12 gouthamve

How do you propose to solve this? Some thoughts:

  • We could not check that limit for label names/values queries, but we could kill the system with such queries over large time ranges
  • Is it really useful seeing label names/values for such large time ranges? What if we "clamp" the max time range to a configurable limit so that, when limit is hit, the query still succeed but the actual queried time range is not larger than the limit?

pracucci avatar Dec 04 '20 08:12 pracucci

We can parallelise these queries just like we do query_range?

Is it really useful seeing label names/values for such large time ranges?

I think so, one case is for example the label value is only "A" for 6months and it changes to "B" for 6months. If people open a 1yr dashboard they'd like to see both. I know this is an esoteric case, but we should ideally have limits on #labels returned than silently querying a range that is smaller.

gouthamve avatar Dec 04 '20 09:12 gouthamve

we should ideally have limits on #labels returned than silently querying a range that is smaller.

Right, agree on this. Clamping without notice is bad UX.

We can parallelise these queries just like we do query_range?

Yes, we could. The only difference I see compared to query_range is that, in the query-frontend, we would have to merge the results (removing duplicates) instead of concat them. I'm wondering CPU and memory wise how impactful this could be on large results sets.

Getting back to limits tho, I think we should have some limits (eg. number of returned values?). Looks a bit risky not having any limit at all on label names/values.

pracucci avatar Dec 04 '20 09:12 pracucci

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 04 '21 22:03 stale[bot]

Still valid

pracucci avatar Mar 05 '21 08:03 pracucci

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 03 '21 09:06 stale[bot]

Still valid

pracucci avatar Jun 03 '21 10:06 pracucci

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 09 '21 01:09 stale[bot]

Reopening

bboreham avatar Nov 07 '21 17:11 bboreham

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 08 '22 11:02 stale[bot]

Reopening

alanprot avatar Mar 03 '22 16:03 alanprot

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 11:06 stale[bot]