ShiftedArrays.jl icon indicating copy to clipboard operation
ShiftedArrays.jl copied to clipboard

CUDA support

Open roflmaostc opened this issue 2 years ago • 2 comments

Hi,

how can we add CUDA support for this?

julia> ShiftedArrays.fftshift(CUDA.rand(2,2))
2×2 CircShiftedArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f8167e4e6e0.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90
 0.332543  0.906493
 0.51592   0.40837

Best,

Felix

roflmaostc avatar Nov 22 '22 10:11 roflmaostc

I think it's just a problem with displaying things. The actual fftshift method shouldn't call getindex. Might be worth to inquire over at CUDA.jl as to whether there's a way for array wrappers to avoid this warning when displaying the array.

piever avatar Nov 30 '22 11:11 piever

Overloading the display can be done. However the real problem is that all broadcasting operations fall back on get_index() calls. This causes CUDA.jl, depending on the settings to either fail or be really slow. A sensible solution would be to implement broadcasting rules for ShiftedArrays. Ideally these rules would be able to broadcast between arrays of the same shifts, of different shifts and between other AbstractArrays. For a circshifted array one probably needs to calculate intersection points and split the arrays into seperate parts. What do you think? Who knows how to implement proper broadcasting rules?

RainerHeintzmann avatar Apr 13 '23 17:04 RainerHeintzmann