dask-ml
dask-ml copied to clipboard
Integration with dask-xgboost.
Hi, I want to work on the integration between dask-ml and dask-xgboost. Specifically, I want to implement GridSearchCV and RandomizedGridSearchCV for estimators like dask-xgboost, which are already distributed by themselves. I tried to look into the existing grid search cv implementation in dask-ml, which seems to be written by manipulating the dask task graph instead of using documented public API. I would like to ask for some guidance before I go ahead and try to implement yet another grid search cv. Some questions are:
- What's the best way to handle algorithms that are already aware of dask and can run on distributed systems?
- Why is the
GridSearchCVwritten in this way instead of using public API? Should I be using it in a downstream project? If so is there a good place to learn more about it? - Any chance of upstreaming the integration into dask-ml if I were able to implement it? It will be useful not only for XGBoost, but also for other similar projects like LightGBM.
Related: https://github.com/dask/dask-ml/issues/833 https://github.com/dmlc/xgboost/issues/5676 https://github.com/dask/dask-ml/issues/758
cc @TomAugspurger (in case you have thoughts here 🙂)
@trivialfis - Quick clarification. Do you want to use Dask for both HP tuning and estimator training?
Yes.
No thoughts. Happy to hear what comes out of this.
On Feb 23, 2022 at 7:39:31 AM, Jiaming Yuan @.***> wrote:
Yes.
— Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/906#issuecomment-1048792255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIQQTW4YD6L74HNTNEDU4TPRHANCNFSM5PC46B2A . You are receiving this because you were mentioned.Message ID: @.***>