dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG] `dask-sql` fails to import with `dask-expr` enabled

Open hendrikmakait opened this issue 1 year ago • 5 comments

What happened:

dask-sql fails to import when trying to use it with dask-expr because dask-expr does not implement dd.Aggregation, which is subclassed in multiple places. We plan to use dask-expr as the default for the dask.dataframe API soon (see discussion issue here: https://github.com/dask/dask/issues/10934), so this will cause problems in the future.

Minimal Complete Verifiable Example:

$ export DASK_DATAFRAME__QUERY_PLANNING=True
$ python
>>> import dask_sql
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/__init__.py", line 8, in <module>
    from .cmd import cmd_loop
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/cmd.py", line 26, in <module>
    from dask_sql.context import Context
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/context.py", line 43, in <module>
    from dask_sql.physical.rel import RelConverter, custom, logical
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/__init__.py", line 1, in <module>
    from .aggregate import DaskAggregatePlugin
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/aggregate.py", line 24, in <module>
    class ReduceAggregation(dd.Aggregation):
TypeError: function() argument 'code' must be code, not str

Environment:

  • dask-sql version: main (https://github.com/dask-contrib/dask-sql/commit/93bb1e5bfc611a5987ed26af23cc80c5b0a23324)

hendrikmakait avatar Feb 23 '24 07:02 hendrikmakait

Update: The problem mentioned above has been resolved via https://github.com/dask/dask/pull/10947 and https://github.com/dask-contrib/dask-expr/pull/893.

There is now a new error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/__init__.py", line 8, in <module>
    from .cmd import cmd_loop
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/cmd.py", line 26, in <module>
    from dask_sql.context import Context
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/context.py", line 43, in <module>
    from dask_sql.physical.rel import RelConverter, custom, logical
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/__init__.py", line 5, in <module>
    from .filter import DaskFilterPlugin
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/rel/logical/filter.py", line 11, in <module>
    from dask_sql.physical.utils.filter import attempt_predicate_pushdown
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/tpch/lib/python3.11/site-packages/dask_sql/physical/utils/filter.py", line 309, in <module>
    M.fillna: dd._Frame.fillna,
              ^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute 'fillna'

hendrikmakait avatar Feb 23 '24 12:02 hendrikmakait

@charlesbluca , @rjzamora: Fixing the usage of dd._Frame shouldn't be much work. Are you aware of any other issues (e.g., other internals dask-sql relies upon) that will get in the way of dask-sql supporting dask-expr? I'm trying to scope the required effort here.

hendrikmakait avatar Feb 23 '24 15:02 hendrikmakait

Thanks for scoping this out @hendrikmakait !

My understanding is that dask-sql uses a lot of custom HLG-optimization code to implement predicate pushdown. If dask-sql can use simpler dask-expr logic for this, then most of that code can probably go away (need to take a careful look soon to say for sure). I know @phofl found that the TPCh benchmarks were a bit faster without predicate pushdown, but it would be nice if users/dask-sql could still "opt in".

rjzamora avatar Feb 23 '24 15:02 rjzamora

@charlesbluca @ayushdg - How high of a priority is it to migrate dask-sql's predicate-pushdown logic to dask-expr? It shouldn't be too much work, but it would also be good to know if simply disabling predicate pushdown is a reasonable temporary workaround?

rjzamora avatar Feb 23 '24 20:02 rjzamora

FYI, we have now flipped the switch and dask-expr has become the default dataframe engine (https://github.com/dask/dask/pull/10967).

hendrikmakait avatar Mar 04 '24 17:03 hendrikmakait

Closed with #1319

charlesbluca avatar Apr 15 '24 19:04 charlesbluca