Peter Andreas Entschev
Peter Andreas Entschev
I don't know if this is something we are ready to do yet for the general use case, [JIT-Unspill for CuPy is still very slow](https://github.com/rapidsai/dask-cuda/issues/840#issuecomment-1143288209). cc @madsbk
> JIT-Unspill should support CuPy arrays by always un-spilling them before task execution: #856. But last I tested it was slow, in slower than default spilling https://github.com/rapidsai/dask-cuda/issues/840#issuecomment-1026013073 . I should...
RAPIDS 21.12 and 22.02 perform better than 21.06. The regression appeared first in 22.04, see results below. RAPIDS 21.06 cuDF benchmark - 10 iterations ``` $ python dask_cuda/benchmarks/local_cudf_merge.py -d 1,2...
The reason for this behavior is compression. Dask 2022.3.0 (RAPIDS 22.04) depends on lz4, whereas Dask 2022.1.0 (RAPIDS 22.02) doesn't. Distributed has by default the `distributed.comm.compression=auto` which ends up picking...
That is a good idea @madsbk , is this something we plan adding to [Distributed](https://github.com/dask/distributed/blob/main/distributed/protocol/compression.py)? It would be good to do that and do some testing/profiling.
While I understand how changing such defaults make sense for TPCx-BB, I'm not so sure this makes sense for everyone, I don't think it will be difficult to find people...
I was just thinking after writing the comment above, maybe we should have some sort of "default recipes" for different use cases? E.g., the TPCx-BB case could use a "performance...
I forgot to mention another related (potentially duplicate) issue: https://github.com/rapidsai/dask-cuda/issues/334 .
@jacobtomlinson I know you've been doing a lot of deployment-related work. I believe this is already covered, at least partially. Could you check if there's something that's still worth covering...
> > @pentschev @madsbk thoughts on if adding preload_nanny / preload_nanny_argv keywords to CUDAWorker seems like a sensible addition? > > Yes, sounds like the right approach. Agreed, this looks...