datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Improve consistency of expression names

Open andygrove opened this issue 3 years ago • 1 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do. We have many different ways to create names from expressions with duplicated and sometimes inconsistent code.

Logical Expression:

  • Display trait
  • Debug trait
  • Expr.name() method (wrapper for create_name function)
  • ExprIdentifierVisitor::desc_expr

Physical Expression:

  • Display trait
  • Debug trait
  • create_physical_name function

One example of confusion is that queries sometimes result in field names containing Divide and sometimes /. For example:

  • decimal_simple.c1 / CAST(Float64(0.00001) AS Decimal128(5, 5)) uses /
  • CAST(decimal_simple.c1 AS Decimal128(30, 19)) Divide CAST(decimal_simple.c5 AS Decimal128(30, 19)) uses Divide

Describe the solution you'd like Make names more consistent and avoid duplicate code

Describe alternatives you've considered None

Additional context None

andygrove avatar Sep 01 '22 13:09 andygrove

Here are some observations after reviewing the logical plan code.

  • Display and Debug produce similar output. Display delegates to Debug for most cases.
  • create_name accepts a schema argument, which is never used. If we remove that, then the logic looks to be about the same as Display and Debug

It seems to me that create_name should be used to determine the name of the expression as it would be represented in a schema, whereas Display would be used to create a representation of the expression to show in the logical plan. In many cases, these will be the same.

andygrove avatar Sep 01 '22 22:09 andygrove