dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

Distributed SQL Engine in Python using Dask

Results 253 dask-sql issues
Sort by recently updated
recently updated
newest added

This PR changes the implementation of `DaskFunctions` to support overloaded UDF definitions: - the `return_type` attribute has been replaced with `return_types`, a `HashMap`, mapping the potential input types of a...

Closes #608 Blocked by: https://github.com/rapidsai/cudf/issues/11515 Note: currently, performing multiple aggregations at once seems to result in incorrect values. Ex: `SELECT STDDEV(a) AS s1, STDDEV_POP(a) AS s2 FROM df` returns the...

datafusion

#629 Implemented STDDEV_POP on cpu, but it currently fails on gpu due to: https://github.com/rapidsai/cudf/issues/11515#issuecomment-1212305118

enhancement
needs triage

Previously, we would get a `ValueError: Not all divisions are known, can't align partitions. Please use set_index to set the index.` for something like: ``` from dask_sql import Context import...

I'm struggling to find a programmatic reproducer for this, but on the datafusion-sql-planner branch: ``` c.sql("SELECT * FROM large_table limit 5") ``` results in reading the entire dataset before filtering...

bug
needs triage

Repro: ``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({"id": [0, 1, 1, 2], "val": [1, 1, 2, 1]}) c.create_table("df", df) c.sql(""" SELECT val,...

bug
needs triage

Add optimizer rules to translate subqueries to joins (when possible)