dask-examples
dask-examples copied to clipboard
Create an Example of Using TPOT Using Dataset that DOESN'T Fit in Memory
All of the examples I've seen either:
- Show TPOT using Dask for training on a dataset that fits in memory (shown here)
- Show how to use Dask-ml with
Incremental
to train on a dataset that doesn't fit in memory (shown here)
...but not how to use TPOT AND a larger than memory dataset.
My attempt at this looked like this:
tpot = TPOTRegressor(generations=100, population_size=25, use_dask=True)
from dask_ml.wrappers import Incremental
inc = Incremental(tpot, scoring='neg_mean_absolute_error')
inc.fit(X_train, y_train)
print(inc.score(X_test.values, y_test.values))
...but this of course throws the error:
Traceback (most recent call last):
File "Z:\Python_Projects\test5.py", line 94, in <module>
inc.fit(X_train, y_train)
File "C:\Users\chalu\AppData\Roaming\Python\Python310\site-packages\dask_ml\wrappers.py", line 579, in fit
self._fit_for_estimator(estimator, X, y, **fit_kwargs)
File "C:\Users\chalu\AppData\Roaming\Python\Python310\site-packages\dask_ml\wrappers.py", line 561, in _fit_for_estimator
result = estimator.partial_fit(X=X, y=y, **fit_kwargs)
AttributeError: 'TPOTRegressor' object has no attribute 'partial_fit'
...because the TPOT objects don't have an incremental fit function.
I've opened an issue here re: attempting to train a TPOT regressor on a "larger than memory" dataset using Dask as I don't know if TPOT allows for larger than memory datasets, but this would be an awesome feature to have some day soon.
Thanks!
Thanks for opening this! I agree that it is always good to have examples that really show the power of Dask :)