dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

EPIC: Contribute Dask-SQL codebase to Apache Arrow DataFusion Python

Open jdye64 opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. Dask-SQL currently supports our own custom set of Rust PyO3 bindings for Apache Arrow DataFusion. Since we started that effort interest has grown in that community around offering their own set of Python bindings for Arrow DataFusion. It seems sensible to me to contribute the bindings that we have and gain the development support from that community and alleviate our developer time for features and enhancements.

This EPIC is setup to track the effort of moving code to Arrow DataFusion Python and then refactoring our codebase to subsequently use it.

While the PRs will mostly be simple in nature there is likely to be several. The choice was made to do several PRs in favor of a single large PR so reviewing would be more quick and easy and to help identify any possible regressions that might present themselves in a more cornered manner.

I will attempt to keep this list up to date with PRs relevant to this effort and their status

Arrow DataFusion Python - Worklog

  • [x] https://github.com/apache/arrow-datafusion-python/pull/204
  • [x] https://github.com/apache/arrow-datafusion-python/pull/208
  • [x] https://github.com/apache/arrow-datafusion-python/pull/214
  • [x] https://github.com/apache/arrow-datafusion-python/pull/216
  • [x] https://github.com/apache/arrow-datafusion-python/pull/218
  • [x] https://github.com/apache/arrow-datafusion-python/pull/220
  • [x] https://github.com/apache/arrow-datafusion-python/pull/223
  • [x] https://github.com/apache/arrow-datafusion-python/pull/229
  • [x] https://github.com/apache/arrow-datafusion-python/pull/227
  • [x] https://github.com/apache/arrow-datafusion-python/pull/232
  • [x] https://github.com/apache/arrow-datafusion-python/pull/233
  • [x] https://github.com/apache/arrow-datafusion-python/pull/266
  • [x] https://github.com/apache/arrow-datafusion-python/pull/269
  • [x] https://github.com/apache/arrow-datafusion-python/pull/271
  • [x] https://github.com/apache/arrow-datafusion-python/pull/273
  • [x] https://github.com/apache/arrow-datafusion-python/pull/277
  • [ ] Add DROP TABLE bindings
  • [ ] Add REPARTITION bindings
  • [ ] Improve build command so that python bindings can be built "out of band", meaning projects like Dask-SQL can build the python bindings and link to their via their own Cargo build process

Dask-SQL - Worklog

  • [ ] https://github.com/dask-contrib/dask-sql/issues/1084
  • [ ] Get conda build working with new dependencies
  • [ ] Passing test_analyze.py
  • [ ] Passing test_cmd.py
  • [ ] Passing test_compatibility.py
  • [ ] Passing test_complex.py
  • [ ] Passing test_create.py
  • [ ] Passing test_distributeby.py
  • [ ] Passing test_explain.py
  • [ ] Passing test_filter.py
  • [ ] Passing test_fugue.py
  • [ ] Passing test_function.py
  • [ ] Passing test_groupby.py
  • [ ] Passing test_hive.py
  • [ ] Passing test_intake.py
  • [ ] Passing test_jdbc.py
  • [ ] Passing test_join.py
  • [ ] Passing test_model.py
  • [ ] Passing test_over.py
  • [ ] Passing test_postgres.py
  • [ ] Passing test_rex.py
  • [ ] Passing test_sample.py
  • [ ] Passing test_schema.py
  • [ ] Passing test_select.py
  • [ ] Passing test_server.py
  • [ ] Passing test_show.py
  • [ ] Passing test_sort.py
  • [ ] Passing test_sqlite.py
  • [ ] Passing test_union.py

jdye64 avatar Mar 13 '23 22:03 jdye64