skywalking [Feature] Implement pre-aggregation on data nodes

[Feature] Implement pre-aggregation on data nodes

Open hanahmily opened this issue 6 months ago • 3 comments

Search before asking

[x] I had searched in the issues and found no similar feature requirement.

Description

Problem

Currently, raw data points are transported to the liaison node for deduplication and aggregation from multiple replicas. This approach creates performance bottlenecks, as all raw data must be transferred over the network before any processing occurs, resulting in increased latency and network overhead.

Proposed Solution

Implement a pre-aggregation mechanism on data nodes that selects all replicas to perform initial aggregation before sending results to the liaison node. This will significantly reduce the amount of data transferred and improve overall query performance.

Implementation Requirements

All Replica Selection:

Ensure the same replica is consistently chosen as the default result.
Handle replica availability and failover scenarios gracefully

Pre-aggregation on Data Nodes:

Implement aggregation logic on the selected primary replica
Support common aggregation operations (sum, count, mean, min, max, etc.)
Ensure partial aggregation results can be properly combined at the liaison node
Maintain compatibility with existing deduplication mechanisms

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

[ ] Yes I am willing to submit a pull request on my own!

Code of Conduct

[x] I agree to follow this project's Code of Conduct

Jun 03 '25 12:06 hanahmily

Please assign to me

Jun 09 '25 14:06 sollhui

As a result of the design review:

Each data node will generate a preliminary aggregated result to send to the liaison node, which will handle deduplication and perform the final aggregation.
A new distributed query plan strategy will be introduced to support semi-aggregated results.
The mean/average function presents extra challenges and should be prioritized for implementation.

Jun 26 '25 06:06 hanahmily

I think this would affect no limit query, right?

Jul 13 '25 09:07 wu-sheng

skywalking skywalking copied to clipboard

[Feature] Implement pre-aggregation on data nodes

Search before asking

Description

Problem

Proposed Solution

Implementation Requirements

Use case

Related issues

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

skywalking
skywalking copied to clipboard