dask-cuda Temporarily disable compression for communication protocols

For GPU data, compression is worse rather than better because it provokes device-to-host transfers when they are unnecessary.

This is a short-term fix for #935, in lieu of hooking up GPU-based compression algorithms.

Jul 21 '22 18:07 wence-

@charlesbluca @beckernick @VibhuJawa @ayushdg @randerzander just so you're aware of this, in case this shows up somehow in any workflows where you may test TCP performance. I believe this should make things faster for Dask-CUDA workflows and not have any negative impact.

Jul 21 '22 19:07 pentschev

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

Jul 21 '22 20:07 jakirkham

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

I can't really explain that, is there any chance we're hitting another condition because TCP will force a D2H/H2D copy? Not sure if you've seen, but this is what this is the comment I left when I found that out: https://github.com/rapidsai/dask-cuda/issues/935#issuecomment-1161639986

Jul 21 '22 20:07 pentschev

TCP currently requires host to device copying regardless of whether there is compression or not. So disabling compression wouldn't fix that. If we have some way to send device objects over TCP, let's discuss how we can enable this in Distributed.

Jul 21 '22 20:07 jakirkham

TCP currently requires host to device copying regardless of whether there is compression or not. So disabling compression wouldn't fix that.

Yes, I know, and this is not just currently, but will always be the case as it needs to go over Ethernet because there's no GPUDirectRDMA in that case. But anyway this is not what I was trying to say, apologies if that was unclear. I'm only imagining this is happening because it hits some other path instead of https://github.com/dask/distributed/blob/3551d1574c9cd72d60197cc84dd75702ebcfec54/distributed/protocol/cuda.py#L28 that you mentioned earlier. The thing is lz4 gets installed by default in Dask now, and that's what caused the behavior change, so I can only imagine that CUDA-specific config is being ignored/overriden.

Jul 21 '22 20:07 pentschev

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

My commit message might be misleading. I got lost trying to follow where the compression was coming from, so it might not be host/device copies, but rather just that on a fast-ish network compressing big chunks of data is slower than just sending them over the wire.

Jul 21 '22 22:07 wence-

Gotcha ok that makes more sense. That said, users can still configure this themselves in those cases. Instead of having a different default (based on whether Distributed or Distributed + Dask-CUDA is used), which may be more confusing, why don't we document this and encourage users to explore different settings based on their needs?

Jul 21 '22 22:07 jakirkham

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

I'll have a look, here are some profiled callgraphs of the two options (compression auto vs compression None)

Jul 22 '22 09:07 wence-

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

I'll have a look, here are some profiled callgraphs of the two options (compression auto vs compression None)

tl;dr: using the TCP protocol never calls cuda_dumps/loads so it's not "GPU data" any more, and so those overrides don't kick in.

OK, so that explicit disabling only kicks in when to_frames is called with a serializer argument that includes "cuda". This happens when the UCX comm backend is used which explicitly sets serializers in write and read. In that case, cuda_dumps is called which produces frames that have device buffers in them (which UCX can handle).

In the tcp case, serializers is always None on call, and so tcp.write calls to_frames which calls into dask_dumps (for which there are handlers registered to deal with cudf dataframes and call host_serialize). But now the header of the message doesn't contain compression overrides, with the consequence that the host dataframe buffers are now compressed.

Jul 22 '22 10:07 wence-

Right though disabling compression won't avoid the DtH/HtD transfers in the TCP case.

Compression is allowed in that case since everything is on host. It just follows Distributed's default.

Certainly users can disable this behavior on their own. We can also add this before our own benchmark scripts as well (if that is important to us).

Would caution against setting this as a default in Dask-CUDA because

It has different behavior than Distributed
Users will not realize adding Dask-CUDA has changed this default somehow
This may not be the right choice for other workflows
It can lead to lengthy debugging by developers building on Dask-CUDA

Here's a recent example of this kind of debugging due to a custom environment variable conda-forge added to scikit-build ( https://github.com/conda-forge/ctng-compiler-activation-feedstock/issues/77 ) ( https://github.com/scikit-build/scikit-build/issues/722 ).

Jul 22 '22 16:07 jakirkham

Particularly I find frustrating those defaults that are hard to really know beforehand, the compression default itself is a great example, something changed in Dask (pulling lz4 by default) that had to be debugged so I could understand. So here we would be setting yet-another implicit default that may be difficult to debug too (now in two layers), so I agree with your point John.

I'm ok with just setting that as a default for benchmark scripts, for example, if Lawrence is ok with that too.

Jul 22 '22 18:07 pentschev

rerun tests

Jul 25 '22 17:07 pentschev

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (branch-22.08@435dae8). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
dask_cuda/__init__.py	0.00%	1 Missing :warning:

Additional details and impacted files

@@              Coverage Diff               @@
##             branch-22.08    #957   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?      16           
  Lines                   ?    2107           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?    2107           
  Partials                ?       0

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 25 '22 18:07 codecov-commenter

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sep 04 '22 16:09 github-actions[bot]

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Oct 04 '22 17:10 github-actions[bot]

dask-cuda dask-cuda copied to clipboard

Temporarily disable compression for communication protocols

Codecov Report

dask-cuda
dask-cuda copied to clipboard