calcite icon indicating copy to clipboard operation
calcite copied to clipboard

[CALCITE-3455] Redundant rule firing for both logical and physical nodes

Open xndai opened this issue 6 years ago • 10 comments
trafficstars

For ProjectMergeRule, EnumerableLimitRule and SortRemoveConstantKeysRule, once they are done with the transformation on logical nodes, there's no need to fire these rules on physical nodes again. So limit these rule matching to be with logical nodes only. This can noticeably reduce optimization latency.

xndai avatar Oct 28 '19 22:10 xndai

Quotes Julian's comment in the mailing list:

Is there a change we could make to the rule API that would make it easier to customize rules? I have been thinking for a while about adding an “operand builder” to replace the static methods (e.g. RelOptRule.operandJ) that we use to build the operands inside rules. Maybe as part of this change, we could allow rules to clone themselves with slightly different parameters.

I’m interested in hearing about a “big change” that would save us from a myriad of small changes in the coming years.

I'm also expecting if we can make more effort to do a bigger refactoring to make the RelOptRule customization more easier.

danny0405 avatar Oct 30 '19 01:10 danny0405

Quotes Julian's comment in the mailing list:

Is there a change we could make to the rule API that would make it easier to customize rules? I have been thinking for a while about adding an “operand builder” to replace the static methods (e.g. RelOptRule.operandJ) that we use to build the operands inside rules. Maybe as part of this change, we could allow rules to clone themselves with slightly different parameters.

I’m interested in hearing about a “big change” that would save us from a myriad of small changes in the coming years.

I'm also expecting if we can make more effort to do a bigger refactoring to make the RelOptRule customization more easier.

If I understand correctly, what Julian's proposing is to refactor the RelOptRule interface so it's easier to specify which class of RelNode to match. In general I agree with this idea, but it's orthogonal to what I am fixing here. If we had such interface in place, we would still need to pass in logical class rather than the base class to avoid duplicating rule firing, which is what I am doing here.

xndai avatar Oct 30 '19 05:10 xndai

@danny0405 had a good feedback - since ProjectMergeRule can also be used by HepPlanner, and in some cases user would use HepPlanner to merge physical project nodes at the end of optimization phase, so I update this change to only disable physical project merge under volcano planer.

xndai avatar Nov 12 '19 17:11 xndai

Have you seen https://issues.apache.org/jira/browse/CALCITE-2223 ? I think 3455 duplicates 2223, and, unfortunately, there's no trivial solution for it.

vlsi avatar Nov 13 '19 15:11 vlsi

Have you seen https://issues.apache.org/jira/browse/CALCITE-2223 ? I think 3455 duplicates 2223, and, unfortunately, there's no trivial solution for it.

I don't think this is a duplication. This patch tries to improve the optimization performance by avoid firing redundant rules. It's not particularly related to cycles between RelSubset.

xndai avatar Nov 14 '19 21:11 xndai

@vlsi @zabetak @rubenada @hsyuan I've updated this patch based on your feedback. Now I create separate LOGICAL_INSTANCE for rules that potentially would only match for logical nodes. Now Calcite users can customize the rule set and use the LOGICAL_INSTANCE if they want. Also updated some unit tests to use the LOGICAL_INSTANCE instead. Please take a look.

xndai avatar Dec 03 '19 21:12 xndai

I have a question related to CALCITE-3455 but not necessarily related to this PR: How about providing an approach for RelOptRule and allow to specify that rels matched should have the same convention (without specifying what that convention should be) ? IMHO we can also save a lot of redundant fires by this way and I think lots of existing rules can work well with this fashion.

jinxing64 avatar Dec 04 '19 02:12 jinxing64

I have a question related to CALCITE-3455 but not necessarily related to this PR: How about providing an approach for RelOptRule and allow to specify that rels matched should have the same convention (without specifying what that convention should be) ? IMHO we can also save a lot of redundant fires by this way and I think lots of existing rules can work well with this fashion.

I agree in general this is the right direction. Not only the rule should allow specific convention for matching, but also able to create new rels with given conventions. For example, today when we fire ProjectMerge, it will generate another LogicalProject even if two EmmuerableProjects are merged. This doesn't make sense. But in order to do that we should provide a way to register RelFactory into the framework for custom convention. This mechanism is also needed by CALCITE-2970.

Having said that, for short term / mid term I feel we can alleviate the problem by restricting certain rules to only match logic nodes, which is what this PR intends to do.

xndai avatar Dec 04 '19 05:12 xndai

There was a discussion in the dev list about rules matching different conventions but we didn't reach consensus at that time.

zabetak avatar Dec 04 '19 07:12 zabetak

Thanks @zabetak for the pointer. After reviewing the whole thread, I now understand why @vlsi believes this is dup of 2223.

I think "single convention" and "convention preserving" rules are the concepts that capture exactly what we need.

For "single convention", although user can specify operand class, some rules today don't have extension points, e.g. ProjectMergeRule. Anyone who would want to update those rule to only match a given convention would have to copy and rewrite the whole rule. Or at least sub-class the existing rules.

And then there's no way to support "convention preserving" as the actual convention needs to be defined when defining the rule.

Personally I agree with @vlsi that this is one outstanding missing piece of Calcite which affects planning performance quite a bit. Most of the Calcite users would just use the default rule set, and expect them to work out of the box. It would be too much to ask them to rewrite or even sub-class the rules.

My patch here provides a temporary solution to address the most pressing need - those rules I updated do get fired a lot in practice. But if we can reach a conclusion on 2223 (looks like we are very close), this patch won't be needed.

xndai avatar Dec 04 '19 23:12 xndai