dask-searchcv icon indicating copy to clipboard operation
dask-searchcv copied to clipboard

Online fit - WIP

Open thomasgreg opened this issue 8 years ago • 4 comments

This is an attempt at issue #32.

~~The following WIP:~~ ~~- removes TokenIterator and main_token which are dependent on parameters and their ordering~~ ~~- constructs tokens for the fit names based on parameters uniquely without depending on a mapping; the dask graph is queried directly for previously encountered tasks~~

~~The current approach in part evolved out of becoming familiar with the assumptions of the existing codebase so I ended up being strict about keys and defensive in graph updates (see update_dsk). Passing around and managing a global seen mapping with dsk may achieve the same effect with minimal code change.~~

Have commited a simpler solution which avoids a major refactoring

Todo:

  • [x] cleaner example using a class derived from DaskBaseSearchCV instead of an unwieldy function
  • [x] tests around the uniqueness of keys (where dsk is updated directly instead of using seen)
  • [ ] ... general cleanup (ParamTokenIterator, example, ...)

thomasgreg avatar May 16 '17 05:05 thomasgreg

Apologies for letting this sit so long - I'll try to give it a good review later today or sometime this weekend. Thanks for taking on this issue :).

jcrist avatar May 19 '17 15:05 jcrist

No worries :) .. just found a bug so working on that and cleaning the example

thomasgreg avatar May 19 '17 16:05 thomasgreg

Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml

https://github.com/dask/dask-ml/pull/221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse some components / structure from there. LMK if you want help with rebasing this on top of dask-ml.

TomAugspurger avatar Jun 29 '18 13:06 TomAugspurger

I'm not sure how much you'll be able to reuse from https://github.com/dask/dask-ml/pull/221 – most the framework there is with _partial_fit_and_score, not with the adaptive framework spelled out in https://github.com/scikit-learn/scikit-learn/pull/9599

stsievert avatar Jun 29 '18 14:06 stsievert