starrocks
starrocks copied to clipboard
support global late materialization
Background
StarRocks provides relatively limited support for late materialization. Currently, only the Scan and HashJoinProbe operators implement some localized late materialization. In many scenarios, this can still result in excessive resource consumption, with Join and TopN being typical examples. Take the following query as an example:
select * from lineorder order by lo_partkey limit 1000000, 5;
In the current execution plan, all columns are read and participate in the top N calculation before the top N operator is executed, yet only 5 rows are ultimately returned.
In fact, we only need to read the lo_partkey column first. After completing the topN calculation, the other columns' data can be read, thereby significantly reducing CPU/IO overhead.
Goal
Implement a global late materialization strategy to delay reading columns unrelated to computation as much as possible, thus improving query performance.
Basic Design
select A.a, A.b, B.a, B.b from A join B on A.a = B.a;
Take this query as an example. After introducing global late materialization, the execution plan would be transformed as follows.
The columns A.b and B.b, which do not participate in the join computation, will be read after the Join operator executes. The Scan operators only need to return columns A.a, B.a, and the rowid for each row. After the Join operation executes, the A.b and B.b columns can then be read using their respective rowids.
TODO
To achieve this goal, we need to address three key issues:
- introducing a rowid to globally identify each row of data; #59975
- enabling the optimizer to generate execution plans for global late materialization; #59992
- enhancing the execution engine with Fetch and LookUp operators to retrieve corresponding data based on the rowid.