FoldsCUDA.jl icon indicating copy to clipboard operation
FoldsCUDA.jl copied to clipboard

Function call and floop simultaneously

Open CharlesRSmith44 opened this issue 3 years ago • 0 comments

I attached an example below. The example works great, but if I want to modify x in the function call, then the function doesn't work. There are real speed gains from combining floop and CUDAEx() compared to other options and I want to be able to exploit them but also modify x within the function. Is that possible?

### Packages
using CUDA, FLoops, BenchmarkTools, FoldsCUDA

### User Inputs
nvec=1000000
M= 50
x = CuArray(rand(Float32, (M, nvec)))

### Function Set up
function parallel_multi(f, x)
   @floop CUDAEx() for i in 1:size(x, 2)
        val = reduce(*,@view(x[:,i])) #works
        #val = reduce(*, @view(x[:,i].^2)) #doesn't work
     #val = reduce(*, x[:,i].^2) #doesn't work
        f[i] = val 
    end
    return f
end

result = CUDA.ones(Float32, (size(x,2),1))

### Comparing speeds
display(@benchmark parallel_multi(result, $x))
display(@benchmark reduce(*, $x, dims = 1))
display(@benchmark prod($x, dims=1)) #identical to above 

'''

CharlesRSmith44 avatar May 05 '21 17:05 CharlesRSmith44