dask-searchcv
dask-searchcv copied to clipboard
Online fit - WIP
This is an attempt at issue #32.
~~The following WIP:~~ ~~- removes TokenIterator and main_token which are dependent on parameters and their ordering~~ ~~- constructs tokens for the fit names based on parameters uniquely without depending on a mapping; the dask graph is queried directly for previously encountered tasks~~
~~The current approach in part evolved out of becoming familiar with the assumptions of the existing codebase so I ended up being strict about keys and defensive in graph updates (see update_dsk). Passing around and managing a global seen mapping with dsk may achieve the same effect with minimal code change.~~
Have commited a simpler solution which avoids a major refactoring
Todo:
- [x] cleaner example using a class derived from
DaskBaseSearchCVinstead of an unwieldy function - [x] tests around the uniqueness of keys (where
dskis updated directly instead of usingseen) - [ ] ... general cleanup (ParamTokenIterator, example, ...)
Apologies for letting this sit so long - I'll try to give it a good review later today or sometime this weekend. Thanks for taking on this issue :).
No worries :) .. just found a bug so working on that and cleaning the example
Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml
https://github.com/dask/dask-ml/pull/221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse some components / structure from there. LMK if you want help with rebasing this on top of dask-ml.
I'm not sure how much you'll be able to reuse from https://github.com/dask/dask-ml/pull/221 – most the framework there is with _partial_fit_and_score, not with the adaptive framework spelled out in https://github.com/scikit-learn/scikit-learn/pull/9599