datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Wrong projection after optimize_projections rule

Open jiacai2050 opened this issue 1 year ago • 5 comments

Describe the bug

When upgrade latest datafusion from a old version, I found optimize_projections rule produce wrong projection. Eg:

create table t(x bigint, y bigint) as values (1,1), (2,2);

select x from t where y > 0;

Only x will be included in projection, y is missing from the plan.

To Reproduce

First apply following changes, the purpose is to push down all filters

@@ -884,6 +884,7 @@ impl OptimizerRule for PushDownFilter {
                 let results = scan
                     .source
                     .supports_filters_pushdown(filter_predicates.as_slice())?;
+                let results  = vec![TableProviderFilterPushDown::Exact; results.len()];
                 let zip = filter_predicates.iter().zip(results);

https://github.com/apache/arrow-datafusion/blob/f4fc2639f1d9d1f4dbc73d39990a83f6bf7a725f/datafusion/optimizer/src/push_down_filter.rs#L887

Then run datafusion-cli:

create table t(x bigint, y bigint) as values (1,1), (2,2);
explain verbose select x from t where y > 0;

Then we will get

| logical_plan after optimize_projections                    | Projection: t.x                                                                                                          |                                                                    
|                                                            |   TableScan: t projection=[x], unsupported_filters=[t.y > Int64(0)]                                                      |  

Expected behavior

The right projection should be [x, y].

Additional context

No response

jiacai2050 avatar Feb 02 '24 09:02 jiacai2050

We will take a look and address next week. Thanks

ozankabak avatar Feb 02 '24 09:02 ozankabak

Possibly related https://github.com/apache/arrow-datafusion/issues/9111

alamb avatar Feb 02 '24 20:02 alamb

I think this is working as designed as explained in https://github.com/apache/arrow-datafusion/pull/9131#pullrequestreview-1865020767 , though perhaps we could improve the design.

alamb avatar Feb 06 '24 12:02 alamb

When I run the queries as you described in the issue body. I got the following plan:

logical_plan 
TableScan: t projection=[x], unsupported_filters=[t.y > Int64(0)]
physical_plan 
MemoryExec: partitions=1, partition_sizes=[1]

Considering #9131 review. I don't think these plans are wrong or sub-optimal. @jiacai2050 Can you try your queries in the main branch (not in the latest release) if possible. Because, I couldn't reproduce the logical plan in the issue.

mustafasrepo avatar Feb 06 '24 13:02 mustafasrepo

@mustafasrepo Sorry for my delayed response, I will re-check this using latest main branch this week.

jiacai2050 avatar Feb 20 '24 03:02 jiacai2050

Closed since this is expected, thanks everyone involved.

For other developers, if you have met this issue, you can check how I "workaround" this issue here

  • https://github.com/apache/incubator-horaedb/blob/8d53620b06eeac16ca4197e9e7f484ce44a6fc6c/src/table_engine/src/provider.rs#L237-L255

jiacai2050 avatar Feb 27 '24 13:02 jiacai2050