dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG][Datafusion] Queries with multiple distinct aggregations return incorrect results

Open ChrisJar opened this issue 2 years ago • 0 comments

What happened: Queries that include multiple distinct aggregations on the same column (ex: SUM(a) and AVG(a)) return incorrect results.

Minimal Complete Verifiable Example:

import pandas as pd
from dask_sql import Context

c = Context()

df = pd.DataFrame({"a":[1,2,4]})
c.create_table("df", df)

c.sql("SELECT SUM(a), AVG(a) FROM df").compute()

returns:

   SUM(df.a)  AVG(df.a)
0        2.0        2.0

and:

import pandas as pd
from dask_sql import Context

c = Context()

df = pd.DataFrame({"a":[1,2,4]})
c.create_table("df", df)

c.sql("SELECT AVG(a), SUM(a) FROM df").compute()

returns:

   AVG(df.a)  SUM(df.a)
0          7          7

Environment:

  • dask-sql version: 8/1 datafusion build
  • Python version: 3.9
  • Operating System: Ubuntu 18.04.4
  • Install method (conda, pip, source): source

ChrisJar avatar Aug 02 '22 02:08 ChrisJar