Pushdown second stage functions when first stage metrics functions can be sharded
when we implemented second stage functions like topk/bottomk In https://github.com/grafana/tempo/pull/4646, we had to do it without pushdown of topk and bottomk functions and only evaluate these functions in the frontend because first stage functions like avg_over_time is can't be sharded.
Not all first stage functions can be sharded but some can be sharded. Ideally can detect if second stage functions and be pushed down or not and pushdown second stage computations when we can.
this will distribute the load away from the frontend for the second stage functions in the queries that can be pushed down.
see https://github.com/grafana/tempo/pull/4646#discussion_r2000567230 for more details.
We have this big note in the code about this as well:
grafana/tempo@eb34f29/pkg/traceql/engine_metrics.go#L1154-L1159
One possible way to implement this is by:
- sending a flag down to know first stage results are shardable or not, if yes, pushdown second stage functions.
- first stage has knowledge of the functions that can be sharded or not, and then set the flag for second stage when parsing the Query
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
Hi! I’ve been looking into this enhancement and I’d like to work on it.
To make sure I’m aligned with the intended direction, here is my understanding of the problem and the approach I’m planning to take:
🔍 Understanding
The goal is to detect whether first-stage functions in a TraceQL expression are shardable, and based on that decide whether second-stage functions (e.g., topk/bottomk) can be safely pushed down to the queriers instead of being executed in the frontend.
A function (or subtree) is shardable if its result can be computed on each shard independently and later merged without changing correctness.
🛠️ Proposed Approach
-
Define a shardability table
A map of function → shardable(bool), e.g.
sum,count,min,max,topkshardable;
avg,avg_over_timenot shardable. -
Annotate the AST
Add aShardableflag to the TraceQL AST function expression node and implement a bottom-up traversal that determines shardability for every node based on:- function shardability, and
- shardability of all child expressions.
-
Expose shardability to the planner
After parsing, run the shardability checker and make the root expression’s flag available to the planning stage. -
Enable or disable pushdown
In the query planning phase, set a flag (e.g.,PushdownEnabled) when the first-stage subtree is shardable, and send this in the request to the queriers. -
Querier-side execution
WhenPushdownEnabledis true, allow the querier to execute second-stage functions locally before returning results to the frontend.
❗ Questions Before I Begin
Before I start implementing:
- Are there functions not documented in TraceQL that should be considered shardable/non-shardable?
- Would you prefer the shardability logic to live under
pkg/traceql/as a separate file (e.g.,shardability.go)? - Is the frontend → querier pushdown flag the preferred mechanism, or should this be inferred implicitly from the plan?
If the approach looks good, I’d be happy to open a PR.
Thanks!