calcite-kudu icon indicating copy to clipboard operation
calcite-kudu copied to clipboard

Use `KuduTableStatistics` to determine row counts

Open sdreynolds opened this issue 3 years ago • 1 comments

Summary: This change includes a bunch of changes happening.

  1. Remove KuduLimitRel - with #18 KuduRortRel now gets the fetch and offset. It no longer is derived from EnumerableLimit and EnumerableLimit is just as efficient as KuduLimit
  2. KuduSortRel and KuduProjectRel no longer produce unstable row estimates. Prior to this change, they produced Double.MIN_VALUE which resulted in Exceptions being thrown during planning process
  3. TableType now has a method for it's estimated row counts
  4. CalciteKuduTable now attempts to get row counts directly from Kudu cluster and if that fails uses the estimates from TableType

This results in a row count estimation that no longer depends on TableType and can be applied more generally.

Contributing to Twilio

All third-party contributors acknowledge that any contributions they provide will be made under the same open-source license that the open-source project is provided under.

  • [X] I acknowledge that all my contributions will be made under the project's license.

sdreynolds avatar Mar 31 '21 00:03 sdreynolds

I still think I want to take a pass at updating all the computeSelfCost implementations to better represent what they are doing.

  1. Projections would lower the number of rows proportional to the number of columns selected
  2. Sort call the super then set the cpu count to 0
  3. Filter would a.) set row count proportional to the number of partitions being scanned and b.) reduce the cost some constant amount per additional filter
  4. Nested Join should adopt the costing provided by EnumerableBatchNestedJoinand stop fixing on ActorDimension table.

sdreynolds avatar Mar 31 '21 16:03 sdreynolds