dask-ml Dask-xgboost example for dask-examples

It would be nice to see an example using the Dask/XGBoost handoff for parallel training and predicting. This is a common question and so would likely have high value.

It would also be useful for this to be smoothly runnable on dask-examples. Presumably we'll have to use a few processes within a LocalCluster and be careful not to blow out RAM on the small containers (XGBoost can be a bit greedy).

Jun 27 '18 11:06 mrocklin

It looks like there is an example in the documentation here: http://dask-ml.readthedocs.io/en/latest/examples/xgboost.html

It's nice in many respects (real data, easily interpretable problem, ...)

However a couple things are concerning about it:

Hard to scale down for users to try things out easily
The ROC curve at the end is not very exciting. I wonder if there is better pre-processing that could be done if we choose to continue with this dataset

Alternatively there might be some artificial dataset that we can create more easily instead.

Jun 29 '18 18:06 mrocklin

It looks like there is an example in the documentation here: http://dask-ml.readthedocs.io/en/latest/examples/xgboost.html

I certainly think this is a good example to keep, and maybe implement a new example in dask-examples. This is good for a static example – it shows an interesting problem that's harder to scale.

I think if we implement a new example for dask-examples, we should use a synthetic dataset. For me the biggest annoyance is the time it takes to process the dataset (at least a minute, often two minutes).

Jun 29 '18 20:06 stsievert

I've opened a PR at https://github.com/dask/dask-examples/pull/14 that mirrors dask-ml documentation example, but is quicker to run because it uses synthetic data.

Jun 29 '18 21:06 stsievert

This is closed by https://github.com/dask/dask-examples/pull/14, correct?

Jul 06 '18 16:07 stsievert

Hello everyone I'm yash, I have experience in machine learning and web D. and I'm new to open source, I have never contributed before this, will anyone give me advice how to start my first contribution.

Jun 15 '22 04:06 yash-dewasthale

dask-ml dask-ml copied to clipboard

Dask-xgboost example for dask-examples

dask-ml
dask-ml copied to clipboard