GPUifyLoops.jl icon indicating copy to clipboard operation
GPUifyLoops.jl copied to clipboard

[WIP] Add support for ROCm

Open jpsamaroo opened this issue 6 years ago • 3 comments

This is nowhere near ready to go yet, but I wanted to get this posted since things are progressing well for AMDGPU support overall :slightly_smiling_face:

TODO:

  • [x] Add synchronization to AMDGPUnative, and then use it here
  • [x] Merge JuliaGPU/AMDGPUnative.jl#6 for math intrinsics support
  • [ ] Implement some means to select the desired backend
  • [ ] Enable scratch and shmem support, and test them
  • [ ] Ensure CI passes on some version of Julia (requires LLVM 7.0+ and probably JuliaLang/julia#31970)
  • [ ] (Optional) Add aliases for threads, blocks, etc. to AMDGPUnative

jpsamaroo avatar May 23 '19 18:05 jpsamaroo

Related #63

vchuravy avatar May 23 '19 19:05 vchuravy

Ok, this is now working for me, albeit without synchronization (I still have to add those intrinsics to AMDGPUnative). Slightly modified example from the GPUifyLoops docs:

using GPUifyLoops, AMDGPUnative, HSARuntime

function kernel(A)
    @loop for i in (1:size(A,1);
                    threadIdx().x)
        A[i] = 2*A[i]
    end
    # TODO: @synchronize
    return nothing
end

kernel(A::HSAArray) = @launch ROC() kernel(A, groupsize=length(A))

data = HSAArray(rand(Float32, 1024))
kernel(data)

jpsamaroo avatar May 23 '19 22:05 jpsamaroo

We should probably define threadIdx etc. in GPUIfyLoops, users currently still have to manually do using CUDAnative to get them.

vchuravy avatar May 26 '19 21:05 vchuravy