Improve consistency of expression names

Open andygrove opened this issue 3 years ago • 1 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do. We have many different ways to create names from expressions with duplicated and sometimes inconsistent code.

Logical Expression:

Display trait
Debug trait
Expr.name() method (wrapper for create_name function)
ExprIdentifierVisitor::desc_expr

Physical Expression:

Display trait
Debug trait
create_physical_name function

One example of confusion is that queries sometimes result in field names containing Divide and sometimes /. For example:

decimal_simple.c1 / CAST(Float64(0.00001) AS Decimal128(5, 5)) uses /
CAST(decimal_simple.c1 AS Decimal128(30, 19)) Divide CAST(decimal_simple.c5 AS Decimal128(30, 19)) uses Divide

Describe the solution you'd like Make names more consistent and avoid duplicate code

Describe alternatives you've considered None

Additional context None

Sep 01 '22 13:09 andygrove

Here are some observations after reviewing the logical plan code.

Display and Debug produce similar output. Display delegates to Debug for most cases.
create_name accepts a schema argument, which is never used. If we remove that, then the logic looks to be about the same as Display and Debug

It seems to me that create_name should be used to determine the name of the expression as it would be represented in a schema, whereas Display would be used to create a representation of the expression to show in the logical plan. In many cases, these will be the same.

Sep 01 '22 22:09 andygrove