This PR tries to support advanced layout in CuFFT. #430

On my PC, only 2D and 3D brfft still break input in out-place transform. So some of the copy is removed. (Otherwise such support seems no sense)

May 18 '21 04:05 N5N3

NVIDIA is not going to 'fix' the need for that copy; they changed the documentation instead:

Note that in-place complex-to-real FFTs may overwrite arbitrary imaginary input point values when non-unit input and output strides are chosen. Out-of-place complex-to-real FFT will always overwrite input buffer.

Could you add some comments to the code, e.g. what the FakeCuArray is for?

May 18 '21 06:05 maleadt

Comments added. If the test passed on 1.6, I'll add some test for strided layout.

May 18 '21 08:05 N5N3

test added

May 18 '21 09:05 N5N3

Please don't do CI-driven development like that, we don't have the resources for it. Get it working locally first, create a PR then.

May 19 '21 06:05 maleadt

@maleadt Sorry for what happened.

I don't have a available CUDA10.x environment, so I used the CI to verify the unexpected error which is correct in 11.x. At present, the strided support for CUDA 10.x has been excluded, and the full test has finished on my PC with CUDA 11.3. Is there a way to switch CUDA version, and then I could test the code on 10.x.

Once again my apologies for all the trouble I caused

May 19 '21 11:05 N5N3

You can use older CUDA versions by setting the JULIA_CUDA_VERSION environment variable to, e.g., 10.1 before importing CUDA.jl.

May 19 '21 14:05 maleadt

Well, my mistake, the drop actrually failed. I want to write something like:

@static if version() >= v"10.2.1" 
    CuFFTArray{T,N} = StridedCuArray{T,N} # only CUDA 11.x pass the strided test
else
    CuFFTArray{T,N} = DenseCuArray{T,N} # only support dense layout on 10.1 and 10.2
end

But this break precompile, any advice? @maleadt.

May 19 '21 16:05 N5N3

You cannot rely on the CUDA version globally, because we can't recompile when the version changes. So you can only inspect version() etc at run time, when functionality is called.

May 19 '21 16:05 maleadt

Unfortunately, the strided 2D fft with batch number >= 256 gives wrong result under CUDA 10.x if the following tranform sizes is used, ( m, n) = ( 8, 8), ( 8,16), ( 9, 9), ( 9,18), (15,15), (15,30), (16,16), (16,32), (18,18), (18,36), (20,20), (20,40), (21,21), (21,42), (22,22), (24,24), (24,48), (25,25), (25,50), (27,27), (28,28), (28,56), (30,30), (30,60), (32,32), (32,64), (35,35), (40,40), (48,48), (49,49), (54,54), (56,56), (60,60), (63,63), (64,64) I only test for m < n <= 64, so the support must be drop for 10.x. I have run the test on CUDA 11.3 and 10.2, and the CUFFT part passed.

May 20 '21 03:05 N5N3

Since I'm entirely unfamiliar with this code, I don't feel comfortable merging this as-is. Pinging some people who will be able to judge this better, if they have the time: @stevengj @ali-ramadhan.

May 27 '21 07:05 maleadt

Codecov Report

Merging #912 (3ddd9a6) into master (35e2186) will increase coverage by 0.01%. The diff coverage is 95.19%.

@@            Coverage Diff             @@
##           master     #912      +/-   ##
==========================================
+ Coverage   76.85%   76.87%   +0.01%     
==========================================
  Files         121      121              
  Lines        7709     7681      -28     
==========================================
- Hits         5925     5905      -20     
+ Misses       1784     1776       -8

Impacted Files	Coverage Δ
lib/cufft/fft.jl	`86.97% <95.19%> (+0.18%)`	:arrow_up:
src/array.jl	`86.27% <0.00%> (+0.72%)`	:arrow_up:
lib/cusolver/CUSOLVER.jl	`83.33% <0.00%> (+4.16%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 35e2186...3ddd9a6. Read the comment docs.

May 27 '21 10:05 codecov[bot]

CUDA.jl
CUDA.jl copied to clipboard

Add support for strided cufft

Codecov Report

CUDA.jl CUDA.jl copied to clipboard

Add support for strided cufft

Codecov Report

CUDA.jl
CUDA.jl copied to clipboard