pycuda icon indicating copy to clipboard operation
pycuda copied to clipboard

[WIP] Add support for CUDA Graphs.

Open gfokkema opened this issue 2 years ago • 9 comments

Hi there!

I wanted to experiment with CUDA Graphs a bit to get a feel for the performance differences between blocking, async and graph execution.

See:

  • https://developer.nvidia.com/blog/cuda-graphs/
  • https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs

However, while most required functionality is available (async, specifying stream, etc), pycuda does not have Graph support yet. This PR adds some initial support to launch a kernel pipeline using a CUgraph.

I'd love your comments and feedback, most likely I am not freeing memory correctly etc, let me know! All in all everything seems to be working enough to be useful already :)

Nice bonus is CUDA Graph API offers a function to output dot files, see picture below and the demo in examples/demo_graph.py. Note that the demo launches the kernel only once. Due to overhead, benefits of the Graph API should only really start showing when launching kernels repeatedly.

CUDA Graph

gfokkema avatar Jan 15 '22 12:01 gfokkema

This looks great, thanks for working on this! To be merged, it'd of course need docs and tests. For lack of GPUs, I don't have usable CI for PyCUDA on Github, but I do have that on a Gitlab instance I run. Mind if I create a user account for you there?

cc @kaushikcfd

inducer avatar Jan 16 '22 22:01 inducer

Hi, thanks for the feedback! Yes, this PR was meant primarily to pitch the idea and get some early feedback :)

And access to already usable CI would be great!

gfokkema avatar Jan 17 '22 12:01 gfokkema

Made an account for you, you should have that info in your email. The site is at https://gitlab.tiker.net/inducer/pycuda.

inducer avatar Jan 17 '22 18:01 inducer

I did some experiments and tests with this and it seems to work without any errors so far. What would be the next steps to bring this to a future release?

mgaedtke avatar Jun 15 '22 08:06 mgaedtke

It's clear that this should happen, ideally soon. As it happens, there are now two (draft) versions of this, one here:

https://gitlab.tiker.net/kaushikcfd/pycuda/-/merge_requests/2/diffs

and the other one in this PR. (They got started independently.) @mitkotak, could you comment on your plans with respect to upstreaming your work?

inducer avatar Jun 15 '22 13:06 inducer

Thanks for your interest in this PR. Right now my estimate is to merge this feature into main in about a month. Most of the wrapper building is done. The purpose of my PR is to broaden the graph creation routes i.e exposing the finer-grained graph building routines in CUDAGraph API alongside the (begin|end)_capture approach. Right now I am handling regression failures, adding more tests and working on docs. Thanks !

mitkotak avatar Jun 15 '22 15:06 mitkotak

Hi there, any updates on the cuda graph feature?

YanBC avatar Oct 20 '22 23:10 YanBC

Hi there, any updates on the cuda graph feature?

Thank you very much for the interest ! We are still testing the PR to make sure that we don't break any existing functionality but if you are curious to learn more then you can try it out using git clone https://gitlab.tiker.net/kaushikcfd/pycuda.git --branch cudagraph and then install it using pip install -e .. You can get comfortable with the syntax through examples/cudagraph_kernel.py and examples/cudagraph_streamcapture.py, and for the docs you can look for CUDAGraphs in doc/driver.rst. Thanks again for the interest and apologies for the delay !

mitkotak avatar Oct 21 '22 16:10 mitkotak

Hi @mitkotak, very much looking forward for this feature! Any idea, when the PR could be ready?

mgaedtke avatar Jun 14 '23 08:06 mgaedtke