Rahul Kindi

Results 2 comments of Rahul Kindi

I encountered the same issue as well. I was able to work around it by constructing the cuda graph inside the `with profile(activities=[ProfilerActivity.CUDA]):` block.

Upon trying some other problem sizes, I found that the SIMT kernel also fails in many cases https://gist.github.com/rkindi/9a25a6d1cbcb5a96167f38ed2fc6b3cc (not with address alignment issue, but with incorrect output). In the linked...