datafusion
datafusion copied to clipboard
Can't push down projection after do type coercion
Describe the bug
in this https://github.com/apache/arrow-datafusion/pull/3396 pr, I remove the type coercion in the physical phase.
We need do the type coercion in the logical phase as much as early.
But after do the type coercion, there is some bugs for this.
For example: test case tpch_explain_q10
the ProjectionPushDown
will not work for this test case.
I add some log to debug this, find the cause reason is the type and column name for this.
// aggregate:
// * remove any aggregate expression that is not required
// * construct the new set of required columns
// Find distinct group by exprs in the case where we have a grouping set
let all_group_expr: Vec<Expr> = grouping_set_to_exprlist(group_expr)?;
exprlist_to_columns(&all_group_expr, &mut new_required_columns)?;
warn!("the input plan schema {:?}", plan.schema().fields());
// Gather all columns needed for expressions in this Aggregate
let mut new_aggr_expr = Vec::new();
aggr_expr.iter().try_for_each(|expr| {
let name = &expr.name()?;
let column = Column::from_name(name);
warn!("debug iter agg expr: {} \n {} \n {} \n", expr, name, column);
if required_columns.contains(&column) {
new_aggr_expr.push(expr.clone());
new_required_columns.insert(column);
// add to the new set of required columns
expr_to_columns(expr, &mut new_required_columns)
} else {
Ok(())
}
})?;
warn!(
"debug optimizer exprs: {:?} due to unexpected colums: {:?}, the new agg_expr: {:?}",
aggr_expr, required_columns, new_aggr_expr
);
run the test
RUST_LOG=trace cargo test --package datafusion --test sql_integration sql::explain_analyze::tpch_explain_q10 -- --exact
Get the warn long
[2022-09-22T02:27:25Z WARN datafusion_optimizer::projection_push_down] the input plan schema [DFField { qualifier: Some("customer"), field: Field { name: "c_custkey", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false
, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { na
me: "c_acctbal", data_type: Decimal128(15, 2), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_phone", data_type: Utf8, nullable: false, dict_id: 0, d
ict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("nation"), field: Field { name: "n_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer")
, field: Field { name: "c_address", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_comment", data_type: Utf8, nullable: false, dict_
id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)", data_type: Decimal128(38, 4), nullable: true, dict_id: 0, dict_is_ordered
: false, metadata: None } }]
[2022-09-22T02:27:25Z WARN datafusion_optimizer::projection_push_down] debug iter agg expr: SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) AS D
ecimal128(38, 4)))
SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)
#SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)
[2022-09-22T02:27:25Z WARN datafusion_optimizer::projection_push_down] debug optimizer exprs: [SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) A
S Decimal128(38, 4)))] due to unexpected colums: {Column { relation: Some("nation"), name: "n_name" }, Column { relation: Some("customer"), name: "c_name" }, Column { relation: None, name: "SUM(lineitem.l_extendedprice * Int64(1) - lineit
em.l_discount)" }, Column { relation: None, name: "revenue" }, Column { relation: Some("customer"), name: "c_custkey" }, Column { relation: Some("customer"), name: "c_address" }, Column { relation: Some("customer"), name: "c_comment" }, C
olumn { relation: Some("customer"), name: "c_acctbal" }, Column { relation: Some("customer"), name: "c_phone" }}, the new agg_expr: []
From the log, we can see
In the schema, a field is Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)"
Bu the expr is SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) AS D ecimal128(38, 4)))
and the column is #SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)
The difference is Int64(1)
and Decimal128(Some(100),23,2
, the decimal128 is got from the type coercion.
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
It's related to https://github.com/apache/arrow-datafusion/issues/3555 and fixed by a special path in the type coercion
I think this fix is not grace, we can do it better.
Hi @liukun4515. Could you please check if this bug has been fixed.
Hi @liukun4515. Could you please check if this bug has been fixed.
we can close it, the issue has been resolved. Thanks @HaoYang670