datafusion
datafusion copied to clipboard
Wrong projection after optimize_projections rule
Describe the bug
When upgrade latest datafusion from a old version, I found optimize_projections
rule produce wrong projection. Eg:
create table t(x bigint, y bigint) as values (1,1), (2,2);
select x from t where y > 0;
Only x will be included in projection
, y is missing from the plan.
To Reproduce
First apply following changes, the purpose is to push down all filters
@@ -884,6 +884,7 @@ impl OptimizerRule for PushDownFilter {
let results = scan
.source
.supports_filters_pushdown(filter_predicates.as_slice())?;
+ let results = vec![TableProviderFilterPushDown::Exact; results.len()];
let zip = filter_predicates.iter().zip(results);
https://github.com/apache/arrow-datafusion/blob/f4fc2639f1d9d1f4dbc73d39990a83f6bf7a725f/datafusion/optimizer/src/push_down_filter.rs#L887
Then run datafusion-cli:
create table t(x bigint, y bigint) as values (1,1), (2,2);
explain verbose select x from t where y > 0;
Then we will get
| logical_plan after optimize_projections | Projection: t.x |
| | TableScan: t projection=[x], unsupported_filters=[t.y > Int64(0)] |
Expected behavior
The right projection should be [x, y]
.
Additional context
No response
We will take a look and address next week. Thanks
Possibly related https://github.com/apache/arrow-datafusion/issues/9111
I think this is working as designed as explained in https://github.com/apache/arrow-datafusion/pull/9131#pullrequestreview-1865020767 , though perhaps we could improve the design.
When I run the queries as you described in the issue body. I got the following plan:
logical_plan
TableScan: t projection=[x], unsupported_filters=[t.y > Int64(0)]
physical_plan
MemoryExec: partitions=1, partition_sizes=[1]
Considering #9131 review. I don't think these plans are wrong or sub-optimal. @jiacai2050 Can you try your queries in the main branch (not in the latest release) if possible. Because, I couldn't reproduce the logical plan in the issue.
@mustafasrepo Sorry for my delayed response, I will re-check this using latest main branch this week.
Closed since this is expected, thanks everyone involved.
For other developers, if you have met this issue, you can check how I "workaround" this issue here
- https://github.com/apache/incubator-horaedb/blob/8d53620b06eeac16ca4197e9e7f484ce44a6fc6c/src/table_engine/src/provider.rs#L237-L255