GPUifyLoops.jl
GPUifyLoops.jl copied to clipboard
[WIP] Add support for ROCm
This is nowhere near ready to go yet, but I wanted to get this posted since things are progressing well for AMDGPU support overall :slightly_smiling_face:
TODO:
- [x] Add synchronization to AMDGPUnative, and then use it here
- [x] Merge JuliaGPU/AMDGPUnative.jl#6 for math intrinsics support
- [ ] Implement some means to select the desired backend
- [ ] Enable scratch and shmem support, and test them
- [ ] Ensure CI passes on some version of Julia (requires LLVM 7.0+ and probably JuliaLang/julia#31970)
- [ ] (Optional) Add aliases for
threads,blocks, etc. to AMDGPUnative
Related #63
Ok, this is now working for me, albeit without synchronization (I still have to add those intrinsics to AMDGPUnative). Slightly modified example from the GPUifyLoops docs:
using GPUifyLoops, AMDGPUnative, HSARuntime
function kernel(A)
@loop for i in (1:size(A,1);
threadIdx().x)
A[i] = 2*A[i]
end
# TODO: @synchronize
return nothing
end
kernel(A::HSAArray) = @launch ROC() kernel(A, groupsize=length(A))
data = HSAArray(rand(Float32, 1024))
kernel(data)
We should probably define threadIdx etc. in GPUIfyLoops, users currently still have to manually do using CUDAnative to get them.