calcite-kudu icon indicating copy to clipboard operation
calcite-kudu copied to clipboard

`KuduFilterRel` calculates cost based on partition pruning

Open sdreynolds opened this issue 3 years ago • 2 comments

Why:

To improve the costing metrics of different plans, the KuduFilterRel should cost itself based on the number of partitions out of the total number of partitions. This calculation has to be cheap and hopefully done once

How:

The KuduFilterRule now pulls out the KuduClient and uses it to make a ScanTokenBuilder for the table without predicates and makes a ScanTokenBuilder for the table with the predicates. These two sizes allow the KuduFilterRel to calculate the percentage of partitions removed. This is used to reduce the cost of the plan based on that metric.

Contributing to Twilio

All third-party contributors acknowledge that any contributions they provide will be made under the same open-source license that the open-source project is provided under.

  • [X] I acknowledge that all my contributions will be made under the project's license.

sdreynolds avatar Aug 25 '21 00:08 sdreynolds

LGTM, with this change does it pick the unioned query over the one that just uses the fact table ?

twdsilva avatar Aug 25 '21 17:08 twdsilva

Really struggling with this one. For a few queries I am getting an empty scan tokens. This can happen when there are no partitions to prune -- which isn't likely the case for my queries and when there is Predicate#getType() returns Predicate.PredicateType.None which I cannot confirm at the moment. Not sure how I got my code to regress since this PR. I will take one more stab at this and perhaps abandon this effort to put the UNION transformation directly in the logical query.

sdreynolds avatar Aug 26 '21 00:08 sdreynolds