FoldsCUDA.jl
FoldsCUDA.jl copied to clipboard
Function call and floop simultaneously
I attached an example below. The example works great, but if I want to modify x in the function call, then the function doesn't work. There are real speed gains from combining floop and CUDAEx() compared to other options and I want to be able to exploit them but also modify x within the function. Is that possible?
### Packages
using CUDA, FLoops, BenchmarkTools, FoldsCUDA
### User Inputs
nvec=1000000
M= 50
x = CuArray(rand(Float32, (M, nvec)))
### Function Set up
function parallel_multi(f, x)
@floop CUDAEx() for i in 1:size(x, 2)
val = reduce(*,@view(x[:,i])) #works
#val = reduce(*, @view(x[:,i].^2)) #doesn't work
#val = reduce(*, x[:,i].^2) #doesn't work
f[i] = val
end
return f
end
result = CUDA.ones(Float32, (size(x,2),1))
### Comparing speeds
display(@benchmark parallel_multi(result, $x))
display(@benchmark reduce(*, $x, dims = 1))
display(@benchmark prod($x, dims=1)) #identical to above
'''