KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Atomic operations on complex numbers
As requested by @vchuravy, this is a copy of my slack question:
Hello, I'm running into an errors with KernelAbstractions.jl, atomic operations using Atomix.jl and complex numbers. Is it possible to somehow perform atomic operations on ComplexF32 ?
As MWE we can just take the atomic operations example from the documentation and create img as an array of ComplexF32:
using CUDA, KernelAbstractions, Atomix
img = zeros(ComplexF32, (50, 50));
img[10:20, 10:20] .= 1;
img[35:45, 35:45] .= 2;
function index_fun_fixed(arr; backend=get_backend(arr))
out = similar(arr)
fill!(out, 0)
kernel! = my_kernel_fixed!(backend)
kernel!(out, arr, ndrange=(size(arr, 1), size(arr, 2)))
return out
end
@kernel function my_kernel_fixed!(out, arr)
i, j = @index(Global, NTuple)
for k in 1:size(out, 1)
Atomix.@atomic out[k, i] += arr[i, j]
end
end
index_fun_fixed(CuArray(img))
index_fun_fixed(img)
On a GPU I get the error:
out_fixed = Array(index_fun_fixed(CuArray(img)));
ERROR: a error was thrown during kernel execution on thread (65, 1, 1) in block (3, 1, 1).
Stacktrace not available, run Julia on debug level 2 for more details (by passing -g2 to the executable).
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce GTX 1080 Ti
and on CPU I get:
out_fixed = Array(index_fun_fixed(img));
ERROR: TaskFailedException
nested task error: MethodError: no method matching modify!(::Ptr{ComplexF32}, ::typeof(+), ::ComplexF32, ::UnsafeAtomics.Internal.LLVMOrdering{:seq_cst})
Closest candidates are:
modify!(::Ptr{T}, ::typeof(UnsafeAtomics.right), ::T, ::Any) where T
@ UnsafeAtomics ~/.julia/packages/UnsafeAtomics/ugwrA/src/core.jl:197
modify!(::Core.LLVMPtr, ::OP, ::Any, ::UnsafeAtomics.Ordering) where OP
@ UnsafeAtomicsLLVM ~/.julia/packages/UnsafeAtomicsLLVM/tbohS/src/internal.jl:20
modify!(::Any, ::Any, ::Any)
@ UnsafeAtomics ~/.julia/packages/UnsafeAtomics/ugwrA/src/core.jl:4
...
Stacktrace:
[1] modify!
@ ~/.julia/packages/Atomix/F9VIX/src/core.jl:33 [inlined]
[2] macro expansion
@ ./REPL[55]:4 [inlined]
[3] cpu_my_kernel_fixed!
@ ~/.julia/packages/KernelAbstractions/HAcqg/src/macros.jl:287 [inlined]
[4] cpu_my_kernel_fixed!(__ctx__::KernelAbstractions.CompilerMetadata{…}, out::Matrix{…}, arr::Matrix{…})
Accessing the real and imag part individually like this:
@kernel function my_kernel_fixed!(out, arr::AbstractArray{<:Complex})
i, j = @index(Global, NTuple)
for k in 1:size(out, 1)
Atomix.@atomic out[k, i].re += arr[i, j].re
Atomix.@atomic out[k, i].im += arr[i, j].im
end
end
results in such an error:
ERROR: TaskFailedException
nested task error: ConcurrencyViolationError("modifyfield!: non-atomic field cannot be written atomically")
A fairly hacky workaround is reinterpret, but I'm not sure that is safe to do:
function index_fun_reinterpret(arr::AbstractArray{<:Complex}; backend=get_backend(arr))
out = similar(arr)
fill!(out, 0)
kernel! = my_kernel_reinterpret!(backend)
kernel!(reinterpret(reshape, Float32, out), arr, ndrange=(size(arr, 1), size(arr, 2)))
return out
end
@kernel function my_kernel_reinterpret!(out, arr::AbstractArray{<:Complex})
i, j = @index(Global, NTuple)
for k in 1:size(out, 2)
Atomix.@atomic out[1, k, i] += arr[i, j].re
Atomix.@atomic out[2, k, i] += arr[i, j].im
end
end
On the issue of safety. This can lead to "torn" updates. E.g. one thread updating re one updating im. Since you are doing an accumulate that should be fine. We would need to support 16byte wide operations, but that would also turn your accumulate operation into a cmpswap loop.