datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Fix inconsistencies in type coercion logic

Open andygrove opened this issue 2 years ago • 3 comments

Describe the bug The TypeCoercion rule incorrectly uses coerce_types (which should probably be renamed to get_binary_op_type because it determines the output type of a binary expression) to determine the types to case the lhs and rhs expressions.

I think we need a new and similar method for determining the common type to case inputs to, for binary expressions.

Maybe we need to introduce signatures for binary ops so this can be consolidated more, similar to how we handle UDF signatrures.

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

andygrove avatar Sep 09 '22 14:09 andygrove

Do we need make align the type coercion with arrow? The cast system should not be same with arrow, because the datafusion is the SQL system and has some special semantics than the arrow lib @andygrove

liukun4515 avatar Sep 09 '22 14:09 liukun4515

I agree that there will be differences. See the conversation at https://github.com/apache/arrow-datafusion/pull/3407#discussion_r967137788 for one example where we have to work around a difference and this example is possibly due to a bug in DataFusions logic

andygrove avatar Sep 09 '22 15:09 andygrove

I updated the title and description now that I understand this more.

andygrove avatar Sep 09 '22 16:09 andygrove

Maybe we could return a struct like

enum NeededCoercion {
  /// cast inputs to these datatypes
  coerce_inputs: Vec<Datatype>,
  /// the resulting output will be this datatype
  output_type: DataType
}

Or something 🤔

alamb avatar Oct 03 '22 15:10 alamb

Hi, could someone help to explain what the coerce_types does. Does it coerce the lhs and rhs types, or calculate the return type of the operator?

HaoYang670 avatar Nov 10 '22 08:11 HaoYang670

Hi, could someone help to explain what the coerce_types does. Does it coerce the lhs and rhs types, or calculate the return type of the operator?

I think this is a reasonable description of coercion:

https://github.com/apache/arrow-datafusion/blob/9c24a79118c0ae0aee480bf039385a06d240b499/datafusion/expr/src/type_coercion.rs#L18-L32

alamb avatar Nov 10 '22 12:11 alamb