datafusion
datafusion copied to clipboard
Fix inconsistencies in type coercion logic
Describe the bug
The TypeCoercion rule incorrectly uses coerce_types
(which should probably be renamed to get_binary_op_type
because it determines the output type of a binary expression) to determine the types to case the lhs and rhs expressions.
I think we need a new and similar method for determining the common type to case inputs to, for binary expressions.
Maybe we need to introduce signatures for binary ops so this can be consolidated more, similar to how we handle UDF signatrures.
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
Do we need make align the type coercion with arrow? The cast system should not be same with arrow, because the datafusion is the SQL system and has some special semantics than the arrow lib @andygrove
I agree that there will be differences. See the conversation at https://github.com/apache/arrow-datafusion/pull/3407#discussion_r967137788 for one example where we have to work around a difference and this example is possibly due to a bug in DataFusions logic
I updated the title and description now that I understand this more.
Maybe we could return a struct like
enum NeededCoercion {
/// cast inputs to these datatypes
coerce_inputs: Vec<Datatype>,
/// the resulting output will be this datatype
output_type: DataType
}
Or something 🤔
Hi, could someone help to explain what the coerce_types
does. Does it coerce the lhs and rhs types, or calculate the return type
of the operator?
Hi, could someone help to explain what the coerce_types does. Does it coerce the lhs and rhs types, or calculate the return type of the operator?
I think this is a reasonable description of coercion:
https://github.com/apache/arrow-datafusion/blob/9c24a79118c0ae0aee480bf039385a06d240b499/datafusion/expr/src/type_coercion.rs#L18-L32