KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Docs for AMD support.

Open tomchor opened this issue 4 years ago • 4 comments

Hi, I'm curious as to what is the status of AMD support in this package. The words NVIDIA or AMD do not appear once in the docs (based on the search feature) so I'm kind of confused about what is supported by this package.

A search on the issues also turned out inconclusive to me.

tomchor avatar Apr 02 '21 14:04 tomchor

cc: @jpsamaroo

you can use AMDGPU and ROCKernels as the backend

vchuravy avatar Apr 02 '21 19:04 vchuravy

Thanks @vchuravy. I see that you've already changed the title of the issue to emphasize the docs (I fixed a small typo there), which is exactly what I was gonna suggest.

Except that I'd say not only add docs for AMD support, but maybe it'd be a good idea to dedicate a section to explaining a bit more what GPUs are supported and maybe the differences for each type (NVIDIA vs AMD). I guess in general people (myself very much included) are way less familiar GPU computations (and thus what's necessary for them) than CPU computations and probably need a bit more explanation.

For example (sorry if this seems like a naive question), I don't know what ROCKernels are (a KernelAbstractions kernel specific for AMD?) and, while I think your comment means that AMD GPUs are supported, I'm not 100% sure.

tomchor avatar Apr 02 '21 20:04 tomchor

So CUDAKernels/ROCKernels are the packages you need to load that to get CUDA/AMDGPU support with KA, respectively. They each export CUDADevice/ROCDevice, which can be passed as the first argument (instead of the usual CPU() argument) to utilize their respective kinds of GPUs. They use the default GPU as usual, and do their own stream/queue management internally, so that independent kernels can execute concurrently.

jpsamaroo avatar Apr 06 '21 00:04 jpsamaroo

So CUDAKernels/ROCKernels are the packages you need to load that to get CUDA/AMDGPU support with KA, respectively. They each export CUDADevice/ROCDevice, which can be passed as the first argument (instead of the usual CPU() argument) to utilize their respective kinds of GPUs. They use the default GPU as usual, and do their own stream/queue management internally, so that independent kernels can execute concurrently.

Thanks for the explanation!

tomchor avatar Apr 06 '21 00:04 tomchor