Peter Andreas Entschev comments

Results 210 comments of


                                            Peter Andreas Entschev

Switch on `Jit-Unspill` by default

I don't know if this is something we are ready to do yet for the general use case, [JIT-Unspill for CuPy is still very slow](https://github.com/rapidsai/dask-cuda/issues/840#issuecomment-1143288209). cc @madsbk

Switch on `Jit-Unspill` by default

> JIT-Unspill should support CuPy arrays by always un-spilling them before task execution: #856. But last I tested it was slow, in slower than default spilling https://github.com/rapidsai/dask-cuda/issues/840#issuecomment-1026013073 . I should...

Performance regression in cuDF merge benchmark

RAPIDS 21.12 and 22.02 perform better than 21.06. The regression appeared first in 22.04, see results below. RAPIDS 21.06 cuDF benchmark - 10 iterations ``` $ python dask_cuda/benchmarks/local_cudf_merge.py -d 1,2...

Performance regression in cuDF merge benchmark

The reason for this behavior is compression. Dask 2022.3.0 (RAPIDS 22.04) depends on lz4, whereas Dask 2022.1.0 (RAPIDS 22.02) doesn't. Distributed has by default the `distributed.comm.compression=auto` which ends up picking...

Performance regression in cuDF merge benchmark

That is a good idea @madsbk , is this something we plan adding to [Distributed](https://github.com/dask/distributed/blob/main/distributed/protocol/compression.py)? It would be good to do that and do some testing/profiling.

[FEA] Consider using different default values for cluster configurations

While I understand how changing such defaults make sense for TPCx-BB, I'm not so sure this makes sense for everyone, I don't think it will be difficult to find people...

[FEA] Consider using different default values for cluster configurations

I was just thinking after writing the comment above, maybe we should have some sort of "default recipes" for different use cases? E.g., the TPCx-BB case could use a "performance...

[FEA] Consider using different default values for cluster configurations

I forgot to mention another related (potentially duplicate) issue: https://github.com/rapidsai/dask-cuda/issues/334 .

Build proof of concept of multi-node join computation on Kubernetes

@jacobtomlinson I know you've been doing a lot of deployment-related work. I believe this is already covered, at least partially. Could you check if there's something that's still worth covering...

Problems using preload to get scheduler address

> > @pentschev @madsbk thoughts on if adding preload_nanny / preload_nanny_argv keywords to CUDAWorker seems like a sensible addition? > > Yes, sounds like the right approach. Agreed, this looks...