dask-sql
dask-sql copied to clipboard
[BUG][Datafusion] Queries with multiple distinct aggregations return incorrect results
What happened:
Queries that include multiple distinct aggregations on the same column (ex: SUM(a)
and AVG(a)
) return incorrect results.
Minimal Complete Verifiable Example:
import pandas as pd
from dask_sql import Context
c = Context()
df = pd.DataFrame({"a":[1,2,4]})
c.create_table("df", df)
c.sql("SELECT SUM(a), AVG(a) FROM df").compute()
returns:
SUM(df.a) AVG(df.a)
0 2.0 2.0
and:
import pandas as pd
from dask_sql import Context
c = Context()
df = pd.DataFrame({"a":[1,2,4]})
c.create_table("df", df)
c.sql("SELECT AVG(a), SUM(a) FROM df").compute()
returns:
AVG(df.a) SUM(df.a)
0 7 7
Environment:
- dask-sql version: 8/1 datafusion build
- Python version: 3.9
- Operating System: Ubuntu 18.04.4
- Install method (conda, pip, source): source