Matthew Jones
Matthew Jones
I don't think memory spilling is the issue. The packing is generally not tight. Do we have any new criterion setup for stealing work with Dask + CUDA? From what...
Point being, I think the right thing to do would be to clarify general criteria for Dask + CUDA work stealing. As best I can see, we're just using what's...
> Sometimes. Sometimes it's absolutely critical to performance. I have not encountered a situation involving Dask + GPUs where work stealing improved performance. I've made benchmark runs in Dask-cuML, Dask-XGBoost....
> Right, so if we change our estimates of bandwidth then things should improve. You may be interested in [dask/distributed#2658](https://github.com/dask/distributed/pull/2658) which tries to learn bandwidth over time. Very interesting! I...
> Having a single task that takes up half of memory sounds problematic to me. It might be worth reconsidering this approach. It's not a single task. A worker was...
>Yes, it's somewhat odd (from a current Dask perspective) to bring in a bunch of data at once and claim a large amount of RAM. I think we're liable to...
@mrocklin it is clear to me that you were enthusiastic from your immediate detailed response. Thank you! **Regarding RF integration** >This seems like an obvious win regardless. This project depends...
>Dask dataframe handles this with the `dask.dataframe.methods.concat` function, which will call the appropriate function based on the inputs provided. Are you suggesting that Dask can replace the logical [`concat` here](https://github.com/dask/dask-xgboost/blob/4661c8a1d3a7f6b63ff994b944b6a6231e7c9f31/dask_xgboost/core.py#L51)?...
>You're going to have to suffer the import time at some point. Might as well put it off until we're sure we're going to need it? Where performance is of...
>I agree that this screws with benchmarking and such, but I'm not sure how feasible it is. We can't predict and pre-import every module that the user might want to...