evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Support for Dask Dataframes

Open chukarsten opened this issue 4 years ago • 3 comments
trafficstars

The Dask documentation suggests using Dask Dataframes.. Is this even possible using Woodwork? Is it worth it?

chukarsten avatar Mar 10 '21 04:03 chukarsten

This should start in woodwork, let's confirm with @gsheni and co where this stands.

After that... we need to talk to @thehomebrewnerd @frances-h @rwedge about what they had to do to add dask DF support to featuretools. The APIs are different.

dsherry avatar Mar 11 '21 18:03 dsherry

Yes, Woodwork supports Dask DataFrames in the current DataTables approach, and in the upcoming Accessor implementation. You can see how to use Dask DataFrames here. The docs also talk about the limitations of using Dask DataFrames

  • https://woodwork.alteryx.com/en/latest/guides/using_woodwork_with_dask_and_koalas.html

gsheni avatar Mar 11 '21 18:03 gsheni

The support situation in Featuretools is similar to Woodwork. Dask DataFrames are supported with the same API, but there are limitations. Most of the functionality not supported falls into one of two categories:

  • not supported because Dask does not fully implement the pandas API
  • not supported because supporting would require bringing the data into memory via an expensive Dask .compute() operation.

I think the biggest issue with Featuretools might be that not all primitives are supported. Here is a guide that outlines the situation: https://featuretools.alteryx.com/en/stable/guides/using_dask_entitysets.html

thehomebrewnerd avatar Mar 11 '21 18:03 thehomebrewnerd