feat(enhancement): Include the e-graph equality saturation framework in the KQIR optimizer.
This pull request introduces significant additions to the e-graph data structure and its associated components for the KQIR optimizer. The changes include the implementation of the e-graph itself, equivalence classes, nodes, and rewrite rules, as well as a new equality saturation pass for query optimization. fix:https://github.com/apache/kvrocks/issues/2561 Key changes include:
E-Graph Implementation:
- Added
EGraph,EClass, andENodeclasses to represent the e-graph, equivalence classes, and nodes respectively. This includes methods for adding nodes, merging classes, and extracting the best query plan based on a cost model. (src/search/passes/egraph.h)
Rewrite Rules:
- Introduced several rewrite rules (
FilterPushDownRewrite,MergeFilterRewrite,SortPushDownRewrite,FilterMergeRewrite,CommonSubexpressionRewrite) to transform the e-graph and optimize query plans. (src/search/passes/egraph_saturation.h)
Equality Saturation Pass:
- Added the
EGraphSaturationclass, which applies the rewrite rules to the e-graph until saturation is achieved and extracts the best query plan using a cost model. (src/search/passes/egraph_saturation.h)
These changes collectively enhance the query optimization capabilities of the KQIR optimizer by leveraging e-graph-based equality saturation techniques. @git-hulk @aleksraiden @PragmaTwice
@AryanVBW Thanks for your contribution!
@AryanVBW Thanks for your contribution!
Thank you, sir, It’s truly my pleasure to work with such humble people. I always love to contribute. Please let me know if there are any improvements I can make or any changes needed
@AryanVBW As I see, a clang-lint into CI have a some warning. Could you please run ./x.py format to fix it?
Ok
Hey @AryanVBW , this PR looks like a good starting point, but it’s missing a some of the key parts to be a functional KQIR optimizer.
There’s no actual equality saturation algorithm, the rewrite rules don’t do anything yet, the cost model is just a placeholder, and it’s not integrated with Kvrocks' query engine (this can be done later ig). Plus, without tests or benchmarks, we can’t validate if it actually improves anything. I’d suggest making this a draft PR and continuing work on it, or creating a separate branch where we can properly develop it before merging.
To the best of my knowledge, this is going to be much more complex than standard SQL parsing. I’d recommend checking out some existing articles like KQIR: a query engine for Apache Kvrocks to get better understanding of how query optimization works in Kvrocks. A good next step would be to experiment with combining multiple operations like SCAN, ZRANGE, and GET. The optimizer can then use egraphs to explore different ways to merge or reorder these operations, in the end finding the most efficient execution path.
A really good resource: https://egraphs-good.github.io/ Very interesting introduction to egraphs: https://www.cole-k.com/2023/07/24/e-graphs-primer/
Hmmm it seems there's just some skeleton rather than a complete implementation.
It cannot work so I think it's hard to get it merged. Also we need some test cases for it.
Yes, sir. I initially started working on it but soon realized that it wasn’t a complete implementation. So, I continued working to complete it properly. Thank you so much, sir, for your review
Hey @AryanVBW , this PR looks like a good starting point, but it’s missing a some of the key parts to be a functional KQIR optimizer.
There’s no actual equality saturation algorithm, the rewrite rules don’t do anything yet, the cost model is just a placeholder, and it’s not integrated with Kvrocks' query engine (this can be done later ig). Plus, without tests or benchmarks, we can’t validate if it actually improves anything. I’d suggest making this a draft PR and continuing work on it, or creating a separate branch where we can properly develop it before merging.
To the best of my knowledge, this is going to be much more complex than standard SQL parsing. I’d recommend checking out some existing articles like KQIR: a query engine for Apache Kvrocks to get better understanding of how query optimization works in Kvrocks. A good next step would be to experiment with combining multiple operations like
SCAN,ZRANGE, andGET. The optimizer can then use egraphs to explore different ways to merge or reorder these operations, in the end finding the most efficient execution path.A really good resource: https://egraphs-good.github.io/ Very interesting introduction to egraphs: https://www.cole-k.com/2023/07/24/e-graphs-primer/
Thank you, sir. I really appreciate your detailed feedback and guidance. I truly enjoy working on this, and I understand that there’s still a lot to refine. I’ll start by creating a separate branch to continue developing a proper KQIR optimizer, ensuring that key components like equality saturation, rewrite rules, and cost models are implemented correctl
I'll also spend some time reading the recommended materials to learn more about query optimization in Kvrocks. I'm looking forward to gradually improving this. Once again, I appreciate your help!