Matthew Jones

Results 21 comments of Matthew Jones

I don't think memory spilling is the issue. The packing is generally not tight. Do we have any new criterion setup for stealing work with Dask + CUDA? From what...

Point being, I think the right thing to do would be to clarify general criteria for Dask + CUDA work stealing. As best I can see, we're just using what's...

> Sometimes. Sometimes it's absolutely critical to performance. I have not encountered a situation involving Dask + GPUs where work stealing improved performance. I've made benchmark runs in Dask-cuML, Dask-XGBoost....

> Right, so if we change our estimates of bandwidth then things should improve. You may be interested in [dask/distributed#2658](https://github.com/dask/distributed/pull/2658) which tries to learn bandwidth over time. Very interesting! I...

> Having a single task that takes up half of memory sounds problematic to me. It might be worth reconsidering this approach. It's not a single task. A worker was...

>Yes, it's somewhat odd (from a current Dask perspective) to bring in a bunch of data at once and claim a large amount of RAM. I think we're liable to...

@mrocklin it is clear to me that you were enthusiastic from your immediate detailed response. Thank you! **Regarding RF integration** >This seems like an obvious win regardless. This project depends...

>Dask dataframe handles this with the `dask.dataframe.methods.concat` function, which will call the appropriate function based on the inputs provided. Are you suggesting that Dask can replace the logical [`concat` here](https://github.com/dask/dask-xgboost/blob/4661c8a1d3a7f6b63ff994b944b6a6231e7c9f31/dask_xgboost/core.py#L51)?...

>You're going to have to suffer the import time at some point. Might as well put it off until we're sure we're going to need it? Where performance is of...

>I agree that this screws with benchmarking and such, but I'm not sure how feasible it is. We can't predict and pre-import every module that the user might want to...