[BUG] YEAR and other operations cannot be used in GROUP
What happened: When trying to execute following SQL:
SELECT
YEAR(record_date) AS year
FROM
FinancialRecord
GROUP BY year
an error is thrown:
RuntimeError: DataFusionError(SchemaError(FieldNotFound { field: Column { relation: Some(Bare { table: "financialrecord" }), name: "record_date" }, valid_fields: [Column { relation: None, name: "year(financialrecord.record_date)" }] }))
When removing GROUP BY clause, the code runs fine.
What you expected to happen: The valid SQL statement should be executed without error.
Minimal Complete Verifiable Example:
# Put your MCVE code here
query = '''
SELECT
YEAR(record_date) AS year
FROM
FinancialRecord
GROUP BY year
'''
ctx = Context()
result = ctx.sql(query)
Anything else we need to know?:
Environment:
- dask-sql version: 2023.6.0
- Python version: 3.10
- Operating System: Linux
- Install method (conda, pip, source): pip
Thanks for raising the issue @mfojtak! On initial inspection it looks like there's an issue with dask-sql when referencing an aliased/aggregate column also as the group column in an expression.
For some of the issues I was seeing the source seems to be around https://github.com/dask-contrib/dask-sql/blob/main/dask_sql/physical/rel/logical/aggregate.py#L295 where we try to reference a column/alias that isn't a part of the input dataframe yet.