Use DASK to support caching and parallelism

Open PGijsbers opened this issue 7 years ago • 1 comments

Perhaps gama could switch from evaluating multiple pipelines single-core in parallel to evaluating one pipeline multi-core sequentially. This could lead to improved performance due to e.g. less memory use.

Aug 31 '18 16:08 PGijsbers

Besides the general pipelines, we currently train encoding/imputation models for every pipeline. While these steps are very quick, it is still wasteful. (this can also be prevented by instead doing preprocessing once in 5-fold CV and then transfer that dataset to search - although that does not generalize to multi-fidelity techniques without adaptations.)

Nov 18 '19 10:11 PGijsbers

Closing this because I don't think DASK is the right tool, and the issue is a bit too broad. Will open a new issue to look into pipeline caching.

Sep 16 '22 08:09 PGijsbers