zfp icon indicating copy to clipboard operation
zfp copied to clipboard

Add support for CUDA streams.

Open corbett5 opened this issue 3 years ago • 8 comments

I have a project where we compute a time step on the GPU and then asynchronously copy some data back to the host for later use. This copy overlaps with the subsequent time step which saves a ton of time. Now I need to compress the data that we save, which I plan to do on device before copying it back to the CPU. It would be nice if this compression could also be asynchronous so I could overlap it with other computation.

corbett5 avatar May 25 '21 22:05 corbett5

I think in principle what you're asking for would be possible via CUDA streams (I must confess to not knowing much about it), but I'm unsure how we would expose such functionality through the zfp API. Currently the only entry point we provide is through zfp_compress(), which does a fair amount of setup work on the CPU and handles any data motion between CPU and GPU. The actual CUDA compression kernel is launched some six levels deep.

Let me discuss this with our CUDA experts to see what can be done.

lindstro avatar May 25 '21 22:05 lindstro

I ran across this paper that seems to have tackled this problem. Not sure if their code is available.

lindstro avatar Sep 22 '21 21:09 lindstro

@lindstro was this something that got a place in this release (1.0.0; release notes does not mention so)? If not, is this in works for the release later this year?

data-panda avatar Aug 03 '22 13:08 data-panda

@data-panda No, this release does not include the latest CUDA and HIP work we have been doing. That will end up in the next release. Regarding CUDA streams specifically, that is not yet something our team has looked at yet. We've had discussions with others who have looked at this (see this paper, for instance) and would welcome a contribution.

lindstro avatar Aug 03 '22 15:08 lindstro

@lindstro could you please share current plans regarding CUDA support in zfp? Specifically, i am interested in:

  • user control over CUDA stream to be used for encode/decode kernel enqueuing
  • fixed precision/accuracy and lossless compression modes

S-o-T avatar Feb 16 '24 20:02 S-o-T

We've yet to do any work on CUDA streams and lossless compression on the GPU. It is unlikely that either would make it into the next release. The next release will, however, have CUDA and HIP support for fixed-precision and -accuracy modes.

lindstro avatar Feb 16 '24 20:02 lindstro

Thanks! Can you share an ETA for next release?

S-o-T avatar Feb 16 '24 20:02 S-o-T

I've been horrible at predicting release dates in the past and am reluctant to give false hope. That said, we're on the hook to do a release no later than end of September. I expect and hope it will happen well before then.

lindstro avatar Feb 16 '24 20:02 lindstro