Rahul Kindi
Results
2
comments of
Rahul Kindi
I encountered the same issue as well. I was able to work around it by constructing the cuda graph inside the `with profile(activities=[ProfilerActivity.CUDA]):` block.
Upon trying some other problem sizes, I found that the SIMT kernel also fails in many cases https://gist.github.com/rkindi/9a25a6d1cbcb5a96167f38ed2fc6b3cc (not with address alignment issue, but with incorrect output). In the linked...