mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Merge identical queries in the scheduling queue

Open bboreham opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Currently, if we receive two or more identical queries, we do all the same work for each of them. This might sound rare, but gets more likely as more people in a company are looking at the same dashboard.

Describe the solution you'd like

If we detect two identical queries going in to the scheduling queue we could merge them and just do the work once.

It's possible that we can fetch most of the result from cache, but many requests are not cached and we don't cache data newer than 10 minutes so queries up to "now" will involve work.

(Also applies to series requests, labels, label values, etc.)

Describe alternatives you've considered

Leave it as-is.

Additional context

We have something like this in store-gateway with the expandedPostingsPromise.

Credit @pracucci who mentioned this idea to me yesterday.

bboreham avatar Dec 21 '22 10:12 bboreham

I like the idea, just adding few notes:

  • "if we receive two or more identical queries" -- do you mean identical start/end times too? I guess that would lower chances of finding identical queries.

  • Request sent to query-scheduler (FrontendToScheduler) has a frontendAddress and queryID. These are used by querier to send the result back to frontend. If we merge multiple requests, querier will need to send results to multiple frontends (with different queryID for each frontend)

  • Results cache is consulted before request is passed to query-scheduler. Queriers don't use results cache today (but ofc that can be changed)

pstibrany avatar Jan 02 '23 09:01 pstibrany

"if we receive two or more identical queries" -- do you mean identical start/end times too? I guess that would lower chances of finding identical queries.

Range queries are aligned by Grafana (to make query results cachable too). I think this idea could still be effective to cover the case many users keep auto-refreshing the same dashboard.

pracucci avatar Jan 02 '23 13:01 pracucci

this duplicate issue has some details on caching and consistent routing of queries to schedulers https://github.com/grafana/mimir/issues/6642

dimitarvdimitrov avatar Jun 13 '24 17:06 dimitarvdimitrov