presto icon indicating copy to clipboard operation
presto copied to clipboard

Refactor query optimization using materialized view

Open zacw7 opened this issue 3 years ago • 5 comments

Regarding the materialized view rewrite code logic, there are a few aspects can be improved as follows:

  • As long as the optimization feature is on, MaterializedViewCandidateExtractor would always fetch referenced materialized views from metastore which is unnecessary. => Some validations on the query shape can be done beforehand. Say, if we know we cannot handle the query (e.g. has subqueries, contains unrecognized function calls), there is no need to talk to metastore.
  • MaterializedViewQueryOptimizer does both validation and rewriting at the same time. => An ideal solution would be separating the two steps.
  • In the validation step, materialized view information is extracted, then the query info is checked based on the materialized view info. => A better approach is to extract the query info so it can be reused to validate the the materialized views.
  • The util class - MaterializedViewOptimizationRewriteUtils doesn’t add much value.

zacw7 avatar Sep 17 '21 22:09 zacw7

https://github.com/prestodb/presto/pull/17994

  • Removes the util class
  • Allows support for subquery optimization

https://github.com/prestodb/presto/pull/17912

  • Doesn't fetch materialized views from metastore unless we know the query shape is valid (i.e. it checks for unsupported functions, since we support subqueries with the above)
  • Reuses materialized view information

Unless we still want to further separate validation and rewriting, we can probably close this.

Sullivan-Patrick avatar Jul 14 '22 15:07 Sullivan-Patrick

Do we have documentation on how materialized view query optimization is designed? At present, it seems that node matching and replacement is carried out in the Analyze stage. I have a question about how to optimize the join operation later or how to make the optimal choice when there are multiple materialized views of a table at this stage?

zhengxingmao avatar Aug 08 '22 06:08 zhengxingmao

I am currently working on Calcite's MV query optimization, trying to borrow its ideas to support Join's materialized view query rewrite in Presto. It is based on the Optimization Queries Using Materialized Views: A Practical, Scalable Solution paper implementation, based on structure adaptation. At the moment our implementation I also feel a bit like a structural-based rewrite, do I understand it right? I hope to participate in the development of the materialized view query rewrite function of the community.CC @zacw7 @Sullivan-Patrick @zpao @joshk

zhengxingmao avatar Aug 08 '22 06:08 zhengxingmao

@zhengxingmao Not sure exactly what you mean by structural-based rewrite, can you elaborate or link me to an article which defines this?

The rewrite code can be found inside MaterializedViewQueryOptimizer where we literally rewrite the SQL, as well as StatementAnalyzer. Join definitions are found inside QuerySpecification.from.

I'm quite new to presto myself, but I've been working with the materialized view rewrite code for the last 10 weeks. Happy to jump into a voice call to answer anything you're unclear on.

Sullivan-Patrick avatar Aug 09 '22 00:08 Sullivan-Patrick

@Sullivan-Patrick Here is the paper address https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf

Structure based rewriting is to extract the features in the query and use a set of rules for matching rewriting. The optimizer represents the query as SPJG standard form (join select project groupby), extracts the five expressions of join, projects, filters, grouping and aggregations in the query, and matches and rewrites the expressions corresponding to the materialized view. This method was systematically proposed by Microsoft in SIGMOD paper "optimizing queries using Materialized Views: a practical, scalable solution" in 2001. This method can rewrite any query method including join, filter and project, and use a series of steps to match and obtain the compensation expression. You can also further rewrite the query containing aggregation, and add an aggregation node to return further summarized results when necessary.Structure based rewriting is easy to expand, such as rewriting outer join and sub query, which can complete almost all rewriting. But the search cost is high, especially when the query is complex and the rewriting attempts are many.

zhengxingmao avatar Aug 09 '22 00:08 zhengxingmao