evalml
evalml copied to clipboard
Support for Dask Dataframes
The Dask documentation suggests using Dask Dataframes.. Is this even possible using Woodwork? Is it worth it?
This should start in woodwork, let's confirm with @gsheni and co where this stands.
After that... we need to talk to @thehomebrewnerd @frances-h @rwedge about what they had to do to add dask DF support to featuretools. The APIs are different.
Yes, Woodwork supports Dask DataFrames in the current DataTables approach, and in the upcoming Accessor implementation. You can see how to use Dask DataFrames here. The docs also talk about the limitations of using Dask DataFrames
- https://woodwork.alteryx.com/en/latest/guides/using_woodwork_with_dask_and_koalas.html
The support situation in Featuretools is similar to Woodwork. Dask DataFrames are supported with the same API, but there are limitations. Most of the functionality not supported falls into one of two categories:
- not supported because Dask does not fully implement the pandas API
- not supported because supporting would require bringing the data into memory via an expensive Dask
.compute()operation.
I think the biggest issue with Featuretools might be that not all primitives are supported. Here is a guide that outlines the situation: https://featuretools.alteryx.com/en/stable/guides/using_dask_entitysets.html