datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Can't push down projection after do type coercion

Open liukun4515 opened this issue 2 years ago • 1 comments

Describe the bug

in this https://github.com/apache/arrow-datafusion/pull/3396 pr, I remove the type coercion in the physical phase.

We need do the type coercion in the logical phase as much as early.

But after do the type coercion, there is some bugs for this.

For example: test case tpch_explain_q10

the ProjectionPushDown will not work for this test case.

I add some log to debug this, find the cause reason is the type and column name for this.

            // aggregate:
            // * remove any aggregate expression that is not required
            // * construct the new set of required columns

            // Find distinct group by exprs in the case where we have a grouping set
            let all_group_expr: Vec<Expr> = grouping_set_to_exprlist(group_expr)?;

            exprlist_to_columns(&all_group_expr, &mut new_required_columns)?;

            warn!("the input plan schema  {:?}", plan.schema().fields());
            // Gather all columns needed for expressions in this Aggregate
            let mut new_aggr_expr = Vec::new();
            aggr_expr.iter().try_for_each(|expr| {
                let name = &expr.name()?;
                let column = Column::from_name(name);
                warn!("debug iter agg expr: {} \n {} \n {} \n", expr, name, column);
                if required_columns.contains(&column) {
                    new_aggr_expr.push(expr.clone());
                    new_required_columns.insert(column);

                    // add to the new set of required columns
                    expr_to_columns(expr, &mut new_required_columns)
                } else {
                    Ok(())
                }
            })?;
            warn!(
                "debug optimizer exprs: {:?} due to unexpected colums: {:?}, the new agg_expr: {:?}",
                aggr_expr, required_columns, new_aggr_expr
            );

run the test

RUST_LOG=trace cargo test --package datafusion --test sql_integration sql::explain_analyze::tpch_explain_q10 -- --exact

Get the warn long

[2022-09-22T02:27:25Z WARN  datafusion_optimizer::projection_push_down] the input plan schema  [DFField { qualifier: Some("customer"), field: Field { name: "c_custkey", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false
, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { na
me: "c_acctbal", data_type: Decimal128(15, 2), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_phone", data_type: Utf8, nullable: false, dict_id: 0, d
ict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("nation"), field: Field { name: "n_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer")
, field: Field { name: "c_address", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_comment", data_type: Utf8, nullable: false, dict_
id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)", data_type: Decimal128(38, 4), nullable: true, dict_id: 0, dict_is_ordered
: false, metadata: None } }]
[2022-09-22T02:27:25Z WARN  datafusion_optimizer::projection_push_down] debug iter agg expr: SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) AS D
ecimal128(38, 4)))
     SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)
     #SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)

[2022-09-22T02:27:25Z WARN  datafusion_optimizer::projection_push_down] debug optimizer exprs: [SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) A
S Decimal128(38, 4)))] due to unexpected colums: {Column { relation: Some("nation"), name: "n_name" }, Column { relation: Some("customer"), name: "c_name" }, Column { relation: None, name: "SUM(lineitem.l_extendedprice * Int64(1) - lineit
em.l_discount)" }, Column { relation: None, name: "revenue" }, Column { relation: Some("customer"), name: "c_custkey" }, Column { relation: Some("customer"), name: "c_address" }, Column { relation: Some("customer"), name: "c_comment" }, C
olumn { relation: Some("customer"), name: "c_acctbal" }, Column { relation: Some("customer"), name: "c_phone" }}, the new agg_expr: []

From the log, we can see

In the schema, a field is Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)"

Bu the expr is SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) AS D ecimal128(38, 4))) and the column is #SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)

The difference is Int64(1) and Decimal128(Some(100),23,2, the decimal128 is got from the type coercion.

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

liukun4515 avatar Sep 22 '22 02:09 liukun4515

It's related to https://github.com/apache/arrow-datafusion/issues/3555 and fixed by a special path in the type coercion

I think this fix is not grace, we can do it better.

liukun4515 avatar Sep 22 '22 02:09 liukun4515

Hi @liukun4515. Could you please check if this bug has been fixed.

HaoYang670 avatar Nov 09 '22 07:11 HaoYang670

Hi @liukun4515. Could you please check if this bug has been fixed.

we can close it, the issue has been resolved. Thanks @HaoYang670

liukun4515 avatar Nov 09 '22 13:11 liukun4515