datafusion
datafusion copied to clipboard
Add test to verify issue #9161
Which issue does this PR close?
Closes #9161.
Rationale for this change
In issue #9161, the expression cast(a as int) in the select list generated a unqualified column named 't.a', which was then incorrectly referenced by the group by clause, leading to a schema error.
After merging PR #9228, the group by clause will now preferentially reference the column of the same name from the input, rather than the one in the select list, thereby resolving the issue.
I suspect there will still be similar issues, but we may need to find a new case to verify whether they truly exist.
What changes are included in this PR?
Add a test.
Are these changes tested?
Yes
Are there any user-facing changes?
No
Thanks @jonahgao I still don't get how this PR closes the issue 🤔
Previously, to parse column references in the 'GROUP BY' expression, we would sequentially search for the column names in the output schema of the select list and then in the input plan.
However, the cast expression generated an output column named "t.a", which has the same name as the column reference in the group by, thus resulting in an incorrect match, which should not happen.
After PR #9228, we first searched from the input plan and then obtained the correct match, which is the column a of table t.
I think there are still bugs concerning the naming of cast expressions and the planning of 'GROUP BY' clauses. But in this case, they cannot be reproduced. I need to find a new one.
Thank you @jonahgao -- sorry for the delay in reviewing
I think this PR is nice that it shows a problem doesn't exist and ensures it won't start causing problems in the future
Thank you for the review! ❤️