dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG] YEAR and other operations cannot be used in GROUP

Open mfojtak opened this issue 2 years ago • 1 comments

What happened: When trying to execute following SQL:

SELECT 
    YEAR(record_date) AS year
FROM 
    FinancialRecord
GROUP BY year

an error is thrown:

RuntimeError: DataFusionError(SchemaError(FieldNotFound { field: Column { relation: Some(Bare { table: "financialrecord" }), name: "record_date" }, valid_fields: [Column { relation: None, name: "year(financialrecord.record_date)" }] }))

When removing GROUP BY clause, the code runs fine.

What you expected to happen: The valid SQL statement should be executed without error.

Minimal Complete Verifiable Example:

# Put your MCVE code here
query = '''
SELECT 
    YEAR(record_date) AS year
FROM 
    FinancialRecord
GROUP BY year
'''
ctx = Context()
result = ctx.sql(query)

Anything else we need to know?:

Environment:

  • dask-sql version: 2023.6.0
  • Python version: 3.10
  • Operating System: Linux
  • Install method (conda, pip, source): pip

mfojtak avatar Jun 20 '23 13:06 mfojtak

Thanks for raising the issue @mfojtak! On initial inspection it looks like there's an issue with dask-sql when referencing an aliased/aggregate column also as the group column in an expression.

For some of the issues I was seeing the source seems to be around https://github.com/dask-contrib/dask-sql/blob/main/dask_sql/physical/rel/logical/aggregate.py#L295 where we try to reference a column/alias that isn't a part of the input dataframe yet.

ayushdg avatar Jun 21 '23 16:06 ayushdg