dask-sql issues

[DF] Add handling for overloaded UDFs

This PR changes the implementation of `DaskFunctions` to support overloaded UDF definitions: - the `return_type` attribute has been replaced with `return_types`, a `HashMap`, mapping the potential input types of a...

charlesbluca

Stop overwriting aggregations on same column

1

Closes #655

ChrisJar

datafusion

Add STDDEV, STDDEV_SAMP, and STDDEV_POP

8

Closes #608 Blocked by: https://github.com/rapidsai/cudf/issues/11515 Note: currently, performing multiple aggregations at once seems to result in incorrect values. Ex: `SELECT STDDEV(a) AS s1, STDDEV_POP(a) AS s2 FROM df` returns the...

ChrisJar

datafusion

[ENH] Add support for STDDEV_POP on GPU

#629 Implemented STDDEV_POP on cpu, but it currently fails on gpu due to: https://github.com/rapidsai/cudf/issues/11515#issuecomment-1212305118

ChrisJar

enhancement

needs triage

Correlated subqueries

2

Previously, we would get a `ValueError: Not all divisions are known, can't align partitions. Please use set_index to set the index.` for something like: ``` from dask_sql import Context import...

sarahyurick

[DF] select * limit 5 seems does a full scan

I'm struggling to find a programmatic reproducer for this, but on the datafusion-sql-planner branch: ``` c.sql("SELECT * FROM large_table limit 5") ``` results in reading the entire dataset before filtering...

randerzander

bug

needs triage

[DF] Some grouped aggregations fail

1

Repro: ``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({"id": [0, 1, 1, 2], "val": [1, 1, 2, 1]}) c.create_table("df", df) c.sql(""" SELECT val,...

randerzander

bug

needs triage