zfp icon indicating copy to clipboard operation
zfp copied to clipboard

Cuda stream

Open svdree-shell opened this issue 1 year ago • 3 comments

This PR provides support for running CUDA compression / decompression on a CUDA stream. This allows for potential overlap between memory transfers and kernel execution, or memory transfers in different directions on devices that support it. Overlapping transfers and execution enables large performance gains in certain scenarios.

To that end I've introduced a zfp_exec_params_cuda struct (similar to zfp_exec_params_omp) that contains a cudaStream_t member. Setting and querying the stream can be done using the new zfp_stream_set_cuda_stream and zfp_stream_cuda_stream functions.

When no stream is specified, the default 0-stream is used. Since several CUDA calls in the code have been changed to their *Async equivalent, a cudaStreamSynchronize(0) is performed when stream 0 is used, to retain blocking behavior in that case.

svdree-shell avatar Jul 16 '24 15:07 svdree-shell

@svdree-shell Thank you so much for this contribution--we've been wanting to support this capability for some time. Unfortunately, as also discussed on PR #232, this is not a great time to merge CUDA/HIP PRs due to extensive changes already being made to the GPU implementations on the staging branch. Let me revisit this PR once our work on staging has been merged into develop.

On a technical note, would it be possible to keep zfp.h insulated from CUDA headers? Keep in mind that zfp.h is C code, and even if ZFP_WITH_CUDA is defined, I'm wary of including CUDA headers into this C header. As best I can tell, cudaStream_t is a pointer, so maybe this could be addressed using a void* in zfp.h.

lindstro avatar Jul 16 '24 20:07 lindstro

Hi @lindstro, would you then also prefer this PR to be targeted to the staging branch?

siebrenreker avatar Jul 18 '24 09:07 siebrenreker

Hi @lindstro, would you then also prefer this PR to be targeted to the staging branch?

In general, we favor the GitFlow model of developing new features on separate feature branches so that they can be well tested before they are merged into develop. The staging branch was created to consolidate CUDA and HIP work done on several different branches that we knew would take some time to massage before this work was ready for develop.

What we have done in the past when a new feature has been developed and submitted as a PR is create a target feature branch. I think that would be preferable to targeting staging, which I'd like to leave open as a pre-develop sandbox for the zfp team. But we would need you to pick up what we ultimately merge from staging into develop. Let me get back to you once that's done, and I'll then set up a feature branch for you. Thanks for your patience.

lindstro avatar Jul 18 '24 15:07 lindstro