The Devil of MergeGroup

Open AveryQi115 opened this issue 10 months ago • 5 comments

Why we have MergeGroup in the first place?

For both testing the optd cascades core and shrinking the search space, we were trying to add heuristic rules in cascades. However, some simple rules like eliminateFilter which eliminate filter node when the filter predicates are always true lead to the situation that we have to add a MergeGroup in optd core.

// memo before eliminateFilter
Group(n): Filter(true) -> child(Group(n+1))
Group(n+1): expr
// memo after eliminateFilter
Group(n): Filter(true) -> child(Group(n+1)), placeholder(Group(n+1))
Group(n+1): expr
// The result returned by eliminateFilter means that Group(n) and Group(n+1) are actually logically equivalent. So we add MergeGroup in optd to merge these two groups.

The devil of MergeGroup

There's no MergeGroup definition in any cascades paper nor columbia paper and corresponding implementation. The adding of mergeGroup is actually pretty experimental. Besides, in real world optimizer, there are always rewrite phase/prepare phase before the cascades framework to make the expressions normalized. There's actually no need to do such heuristic rules in cascades.

The introduction of MergeGroup leads to two problems: loops in operator tree.

loops in operator tree

Take Select * from t1 where true as an example. The initial groups are:

// memo
G1: filter(true) -> child(G2)
G2: Scan (t1)

After eliminateFilter in cascades rule, it becomes:

// memo
G1: filter(true)->child(G2), Scan(t1)
G2: merged to G1, reduceGroupId=G1

It is ok when we execute the query as it only cares the winner. But when we wants to traverse the operator tree somehow, it goes into loops for expressions like filter(true)->child(G2) as they are expressions points to themselves as a child.

It makes tricky bugs for some logic other than execution, for example, join order enumeration, tree traversal, debug printing and so on.

A tricky case is that will it make firing tasks out of order?

In optd, the order of the tasks fired are carefully designed following the columbia paper.

We first fire OptimizeGroup task for the root/top/first group, it then traverse all the expressions in the task and fire OptimizeExpr task for all the logical exprs in the group, and OptimizeInput task for all the physical exprs in the group.

OptimizeExpr first call ExploreGroup tasks for the expr's children and then traverse the rules and fire ApplyRule task for the matched rules for expr.

Newly generated logical exprs are fired with new OptimizeExpr as they are not originally in the group when optimizeGroup or ExploreGroup is called.

OptimizeInput is responsible to calculate the costs for the expr and update winner for the group, it fires optimizeGroup for the children group when the children group haven't have a winner. But except for the root/top/first group, other groups have to be the children node of the physical expression in OptimizeInput can it be called with optimizeGroup.

In abstract, the cascades framework expect all the groups and their children are fully explored (apply all matching rules) from top group to the leaf group, and the costs and winners are updated then from the top group's physical expression to the children group.

If we merge two groups, these groups exist before merge (as rules can not create new group themselves, they can only return existing group). They are either being called with exploreGroup/OptimizeGroup or not. If they are called with exploreGroup before, we can not explore the group again.(exploreGroup skip groups following the groupId, though they are merged, the group Id remains the same). And optimizeGroup can only be called for root group or children group of physical expressions in parent group.

So if the merge destination group is explored before, the merge source group is not explored and both groups are not being called with optimizeGroup(it is either not the children group of physical expressions or pruned if we adds pruning), it will miss newly added expressions from merge source group. 🤯🤯🤯 It is a very trivial case and hard to be found.

Future task

MergeGroup will lead to debugging nightmare and tricky problems. We may need to remove it. Besides, if we don't add heuristic rules into cascades logic, we really don't need it.

Apr 05 '24 07:04 AveryQi115

optd optd copied to clipboard

The Devil of MergeGroup

Why we have MergeGroup in the first place?

The devil of MergeGroup

loops in operator tree

A tricky case is that will it make firing tasks out of order?

Future task

optd
optd copied to clipboard