presto
presto copied to clipboard
Refactor query optimization using materialized view
Regarding the materialized view rewrite code logic, there are a few aspects can be improved as follows:
- As long as the optimization feature is on,
MaterializedViewCandidateExtractor
would always fetch referenced materialized views from metastore which is unnecessary. => Some validations on the query shape can be done beforehand. Say, if we know we cannot handle the query (e.g. has subqueries, contains unrecognized function calls), there is no need to talk to metastore. -
MaterializedViewQueryOptimizer
does both validation and rewriting at the same time. => An ideal solution would be separating the two steps. - In the validation step, materialized view information is extracted, then the query info is checked based on the materialized view info. => A better approach is to extract the query info so it can be reused to validate the the materialized views.
- The util class -
MaterializedViewOptimizationRewriteUtils
doesn’t add much value.
https://github.com/prestodb/presto/pull/17994
- Removes the util class
- Allows support for subquery optimization
https://github.com/prestodb/presto/pull/17912
- Doesn't fetch materialized views from metastore unless we know the query shape is valid (i.e. it checks for unsupported functions, since we support subqueries with the above)
- Reuses materialized view information
Unless we still want to further separate validation and rewriting, we can probably close this.
Do we have documentation on how materialized view query optimization is designed? At present, it seems that node matching and replacement is carried out in the Analyze stage. I have a question about how to optimize the join operation later or how to make the optimal choice when there are multiple materialized views of a table at this stage?
I am currently working on Calcite's MV query optimization, trying to borrow its ideas to support Join's materialized view query rewrite in Presto. It is based on the Optimization Queries Using Materialized Views: A Practical, Scalable Solution paper implementation, based on structure adaptation. At the moment our implementation I also feel a bit like a structural-based rewrite, do I understand it right? I hope to participate in the development of the materialized view query rewrite function of the community.CC @zacw7 @Sullivan-Patrick @zpao @joshk
@zhengxingmao Not sure exactly what you mean by structural-based rewrite, can you elaborate or link me to an article which defines this?
The rewrite code can be found inside MaterializedViewQueryOptimizer
where we literally rewrite the SQL, as well as StatementAnalyzer
. Join definitions are found inside QuerySpecification.from
.
I'm quite new to presto myself, but I've been working with the materialized view rewrite code for the last 10 weeks. Happy to jump into a voice call to answer anything you're unclear on.
@Sullivan-Patrick Here is the paper address https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf
Structure based rewriting is to extract the features in the query and use a set of rules for matching rewriting. The optimizer represents the query as SPJG standard form (join select project groupby), extracts the five expressions of join, projects, filters, grouping and aggregations in the query, and matches and rewrites the expressions corresponding to the materialized view. This method was systematically proposed by Microsoft in SIGMOD paper "optimizing queries using Materialized Views: a practical, scalable solution" in 2001. This method can rewrite any query method including join, filter and project, and use a series of steps to match and obtain the compensation expression. You can also further rewrite the query containing aggregation, and add an aggregation node to return further summarized results when necessary.Structure based rewriting is easy to expand, such as rewriting outer join and sub query, which can complete almost all rewriting. But the search cost is high, especially when the query is complex and the rewriting attempts are many.