KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Docs for AMD support.
Hi, I'm curious as to what is the status of AMD support in this package. The words NVIDIA or AMD do not appear once in the docs (based on the search feature) so I'm kind of confused about what is supported by this package.
A search on the issues also turned out inconclusive to me.
cc: @jpsamaroo
you can use AMDGPU and ROCKernels as the backend
Thanks @vchuravy. I see that you've already changed the title of the issue to emphasize the docs (I fixed a small typo there), which is exactly what I was gonna suggest.
Except that I'd say not only add docs for AMD support, but maybe it'd be a good idea to dedicate a section to explaining a bit more what GPUs are supported and maybe the differences for each type (NVIDIA vs AMD). I guess in general people (myself very much included) are way less familiar GPU computations (and thus what's necessary for them) than CPU computations and probably need a bit more explanation.
For example (sorry if this seems like a naive question), I don't know what ROCKernels are (a KernelAbstractions kernel specific for AMD?) and, while I think your comment means that AMD GPUs are supported, I'm not 100% sure.
So CUDAKernels/ROCKernels are the packages you need to load that to get CUDA/AMDGPU support with KA, respectively. They each export CUDADevice/ROCDevice, which can be passed as the first argument (instead of the usual CPU() argument) to utilize their respective kinds of GPUs. They use the default GPU as usual, and do their own stream/queue management internally, so that independent kernels can execute concurrently.
So CUDAKernels/ROCKernels are the packages you need to load that to get CUDA/AMDGPU support with KA, respectively. They each export
CUDADevice/ROCDevice, which can be passed as the first argument (instead of the usualCPU()argument) to utilize their respective kinds of GPUs. They use the default GPU as usual, and do their own stream/queue management internally, so that independent kernels can execute concurrently.
Thanks for the explanation!