engine: Aliases bug
Error
While working on to_date and try_to_date UDFs (PR: #1376; Issues: #624 & #1156). I got this error multiple times (similar ones), both in SLT tests and insta tests:
Error: Error during planning: Projections require unique expression names but the expression \"to_date(Utf8(\"2024-05-10\"), Utf8(\"AUTO\"))\" at position 0 and \"to_date(Utf8(\"2024-05-10\"), Utf8(\"AUTO\"))\" at position 1 have the same name. Consider aliasing (\"AS\") one of them.
Bug
This shouldn't happen, since we have a Visitor rewriter that fixes aliasing.
Possible solution
Probably DF treats both UDFs without the alias, meaning our Visitor rewriter which assigns a unique alias on each treats them also the same. This could be the source of the bug.
On pre (and post ) visit of the Select query in VistorMut for AST both funtions have different names for query:
SELECT TO_DATE('2024-05-10'), DATE('2024-05-10');
However, later DF has an error that both functions now alias under the same name so SELECT TO_DATE('2024-05-10'), DATE('2024-05-10'); -> SELECT TO_DATE('2024-05-10'), TO_DATE('2024-05-10');, since DATE(...) is an alias of the UDF TO_DATE(...).
Possible solution: Logical Planner / Rewriter?
Possible solution: Logical Planner / Rewriter?
Error
We don't produce a valid plan, instead we get an error. Logical plan Rewriter is useless here. We don't get to the point of rewriting a plan. Examples:
- If you call the same function twice (without using a function alias) and using the same arguments, our AST Rewriter will kick in, and correct it. Ex:
SELECT TO_DATE('2024-05-10'), TO_DATE('2024-05-10');. - If you call the same function twice (with or without using a function alias) but with different arguments between the two, there is noting to fix and it will work. Ex:
SELECT TO_DATE('2024-05-10'), DATE('2024-07-10');. - If you call the same function twice (with using a function alias) and using the same arguments, we won't be able to produce a valid plan and we will error. Ex:
SELECT TO_DATE('2024-05-10'), DATE('2024-05-10');.
The last example is the error, happening since function name resolution happens at AST -> LogicalPlan conversion. So in AST we have just two FunctionObjects with different names. Because we have no notion (context) that those names resolve to the same function, our AST Rewriter doesn't kick in. And only after we are resolving the aliases and names to real functions we find out that we have the same name (alias) for them. This is done by default DF planner hence it errors.
Hacky way solution
Provide the context (notion) of function name resolution at the AST Rewriter level. I'm pretty sure that's an incorrect way of building a DB. But it should be technically possible.
Correct solution
Logical Planner is the correct solution, but it's just not worth the effort, since this is an edge case. Calling the same function twice using a function alias with the same arguments should be rare. But this needs to be verified. It's possible to get the names, but we are still talking about AST -> LogicalPlan. Rebuilding this from scratch is definitely too much (making the SELECT by hand). And a half-baked solution is to resolve the names, and then provide the new AST SELECT with resolved and correctly aliased function names to DF default planner to create the correct plan. But again the resolution of function names falls on us. This however is much more correct than resolving function names (aliases) in the AST Rewriter.
Balanced solution (not really)
Don't use aliases in functions. Provide the function name as an input at creation of the function. So we will have the same function body, but all of them will have different names. This is a patch solution, that will solve this specific edge case. But it may introduce other problems, in the not so near, future.
tl;dr
Error: AST doesn't have the context of function name (aliases) resolution. So our AST Rewriter gets to_date & date and doesn't alias, since they are not the same. Logical plan Rewriter doesn't work since we error on plan creation by DF default planner (we don't get a plan to rewrite).
Summary: Probably there is no need to put effort into this for now, since this is an edge case (up for debate). All the solutions provided here are either: bad design, big effort or could be faulty in the future.
cc @osipovartem @rampage644 @Vedin @ravlio