dask-examples icon indicating copy to clipboard operation
dask-examples copied to clipboard

Create an Example of Using TPOT Using Dataset that DOESN'T Fit in Memory

Open windowshopr opened this issue 2 years ago • 1 comments

All of the examples I've seen either:

  1. Show TPOT using Dask for training on a dataset that fits in memory (shown here)
  2. Show how to use Dask-ml with Incremental to train on a dataset that doesn't fit in memory (shown here)

...but not how to use TPOT AND a larger than memory dataset.

My attempt at this looked like this:

tpot = TPOTRegressor(generations=100, population_size=25, use_dask=True)

from dask_ml.wrappers import Incremental
inc = Incremental(tpot, scoring='neg_mean_absolute_error')

inc.fit(X_train, y_train)

print(inc.score(X_test.values, y_test.values))

...but this of course throws the error:

Traceback (most recent call last):
  File "Z:\Python_Projects\test5.py", line 94, in <module>
    inc.fit(X_train, y_train)
  File "C:\Users\chalu\AppData\Roaming\Python\Python310\site-packages\dask_ml\wrappers.py", line 579, in fit
    self._fit_for_estimator(estimator, X, y, **fit_kwargs)
  File "C:\Users\chalu\AppData\Roaming\Python\Python310\site-packages\dask_ml\wrappers.py", line 561, in _fit_for_estimator
    result = estimator.partial_fit(X=X, y=y, **fit_kwargs)
AttributeError: 'TPOTRegressor' object has no attribute 'partial_fit'

...because the TPOT objects don't have an incremental fit function.

I've opened an issue here re: attempting to train a TPOT regressor on a "larger than memory" dataset using Dask as I don't know if TPOT allows for larger than memory datasets, but this would be an awesome feature to have some day soon.

Thanks!

windowshopr avatar Aug 01 '22 03:08 windowshopr

Thanks for opening this! I agree that it is always good to have examples that really show the power of Dask :)

jsignell avatar Aug 02 '22 16:08 jsignell