dask-sql
dask-sql copied to clipboard
Distributed SQL Engine in Python using Dask
Related to #411 I'm trying to convert a timedelta to an integer: ``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({'dt0': ['2022-03-01 12:00:00'], 'dt1':...
**What is your question?** One advertised feature of dask-sql is its ability to compute queries on relatively bare Dask clusters - to back this claim up, we run a majority...
I'd like to compute [median using percentiles](https://stackoverflow.com/questions/26863139/how-to-calculate-median-in-hive) of a column (this is also [supported in Spark SQL](https://spark.apache.org/docs/latest/api/sql/index.html#percentile)): ``` >>> import pandas as pd >>> from dask_sql import Context >>> c...
**Is your feature request related to a problem? Please describe.** Based on discussions in #218 dask-sql no longer persists by default when creating tables from dataframes using the `context.create_table` syntax....
## Report incorrect documentation **Location of incorrect documentation** The api documentations [page](https://dask-sql.readthedocs.io/en/latest/pages/api.html) returns a blank page on the docs site. Might be similar to #215 ---
Sometimes it's not clear when running a number of SQL scripts in a session whether all scripts "clean up" after themselves (dropping temp tables, unpersisting tables, de-registering UDFs, etc). It...
**Is your feature request related to a problem? Please describe.** In https://github.com/dask-contrib/dask-sql/pull/394 we are adding GPU section for ML model training. All other sections (see below) of the [machine-learning](https://dask-sql.readthedocs.io/en/latest/pages/machine_learning.html#example) docs,...
I'd like to do some data manipulation and persist results to storage. In Hive and Spark, you can use the `CREATE EXTERNAL TABLE ...` syntax which allows specifying, for example,...
After #338 merged, we should be able to eliminate unnecessary reads from storage. A glance at both the API docs page, and the SQL syntax page have no mention of...
**Is your feature request related to a problem? Please describe.** Now that we have GPU support for ML in `dask-sql` we should add tests for gpu backend of TPOT.