Convert list operator to function in `sql_expr_to_logical_expr`
Like what we have done for [] -> MakeArray.
It is possible to have the similar conversion for other List Operator in the early stage (Sql to LogicExpr).
- List || List -> array_concat #8636
- List || non-List -> array_append #8636
- non-List || List -> array_prepend #8636
TODO
- left <@ right -> array_has_all(right, left)
- left @> right -> array_has_all(left, right)
- list && list -> array_intersect #8496
Other: clean up list/array coercion in datafusion/expr/src/type_coercion/binary.rs
Originally posted by @viirya in https://github.com/apache/arrow-datafusion/pull/8496#discussion_r1423031611
This makes sense to me
I think there is a little issue with the current design.
We rewrite the | | operator to function after the logical plan is built. Before OperatorToFunction is applied, we need to build logical plan, which calls projection_schema. It calls exprlist_to_fields to build schema. Inside to_field, we calculate the return type of string concat, which does type coercion. The problem is that the return type calculated here is useless, because we rewrite it to function afterward.
The type coercion for string concat is done inside string_coercion, btw, it is incorrect. https://github.com/apache/arrow-datafusion/blob/dd4263f843e093c807d63edf73a571b1ba2669b5/datafusion/expr/src/type_coercion/binary.rs#L704C9-L708
select [1,2,3] || 4.1 return List(Int64), but it should be List(F64) instead. However, since it is rewritten to Function, so the output is still correct at the end.
If we need to avoid the calculation (mainly type coercion) here, we may need an if else branch to skip the schema building for the operators that will be rewritten afterward. For example, return empty schema inside projection_schema, if the expr is going to be rewritten later. (Currently only List operator)
pub struct BinaryExpr {
/// Left-hand side of the expression
pub left: Box<Expr>,
/// The comparison operator
pub op: Operator,
/// Right-hand side of the expression
pub right: Box<Expr>,
/// will be rewritten (maybe better name)
pub will_be_rewritten: bool
}
We can then do different things like return empty schema while building logical plan based on the will_be_rewritten.
p.s. since Expr is Enum not trait, so we can only have will_be_rewritten for each possible Expr.
@alamb @viirya how do you think?
Upd:
I have tested that return empty schema for project_schema and remove the constraint below works.
if expr.len() != schema.fields().len() {
return plan_err!("Projection has mismatch between number of expressions ({}) and number of fields in schema ({})", expr.len(), schema.fields().len());
}