ShiftedArrays.jl
ShiftedArrays.jl copied to clipboard
CUDA support
Hi,
how can we add CUDA support for this?
julia> ShiftedArrays.fftshift(CUDA.rand(2,2))
2×2 CircShiftedArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f8167e4e6e0.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90
0.332543 0.906493
0.51592 0.40837
Best,
Felix
I think it's just a problem with displaying things. The actual fftshift method shouldn't call getindex. Might be worth to inquire over at CUDA.jl as to whether there's a way for array wrappers to avoid this warning when displaying the array.
Overloading the display can be done. However the real problem is that all broadcasting operations fall back on get_index() calls. This causes CUDA.jl, depending on the settings to either fail or be really slow.
A sensible solution would be to implement broadcasting rules for ShiftedArrays. Ideally these rules would be able to broadcast between arrays of the same shifts, of different shifts and between other AbstractArrays. For a circshifted array one probably needs to calculate intersection points and split the arrays into seperate parts. What do you think? Who knows how to implement proper broadcasting rules?