CUDA.jl
CUDA.jl copied to clipboard
Add support for strided cufft
This PR tries to support advanced layout in CuFFT. #430
On my PC, only 2D and 3D brfft still break input in out-place transform. So some of the copy is removed. (Otherwise such support seems no sense)
NVIDIA is not going to 'fix' the need for that copy; they changed the documentation instead:
Note that in-place complex-to-real FFTs may overwrite arbitrary imaginary input point values when non-unit input and output strides are chosen. Out-of-place complex-to-real FFT will always overwrite input buffer.
Could you add some comments to the code, e.g. what the FakeCuArray is for?
Comments added. If the test passed on 1.6, I'll add some test for strided layout.
test added
Please don't do CI-driven development like that, we don't have the resources for it. Get it working locally first, create a PR then.
@maleadt Sorry for what happened.
I don't have a available CUDA10.x environment, so I used the CI to verify the unexpected error which is correct in 11.x. At present, the strided support for CUDA 10.x has been excluded, and the full test has finished on my PC with CUDA 11.3. Is there a way to switch CUDA version, and then I could test the code on 10.x.
Once again my apologies for all the trouble I caused
You can use older CUDA versions by setting the JULIA_CUDA_VERSION environment variable to, e.g., 10.1 before importing CUDA.jl.
Well, my mistake, the drop actrually failed. I want to write something like:
@static if version() >= v"10.2.1"
CuFFTArray{T,N} = StridedCuArray{T,N} # only CUDA 11.x pass the strided test
else
CuFFTArray{T,N} = DenseCuArray{T,N} # only support dense layout on 10.1 and 10.2
end
But this break precompile, any advice? @maleadt.
You cannot rely on the CUDA version globally, because we can't recompile when the version changes. So you can only inspect version() etc at run time, when functionality is called.
Unfortunately, the strided 2D fft with batch number >= 256 gives wrong result under CUDA 10.x if the following tranform sizes is used, ( m, n) = ( 8, 8), ( 8,16), ( 9, 9), ( 9,18), (15,15), (15,30), (16,16), (16,32), (18,18), (18,36), (20,20), (20,40), (21,21), (21,42), (22,22), (24,24), (24,48), (25,25), (25,50), (27,27), (28,28), (28,56), (30,30), (30,60), (32,32), (32,64), (35,35), (40,40), (48,48), (49,49), (54,54), (56,56), (60,60), (63,63), (64,64) I only test for m < n <= 64, so the support must be drop for 10.x. I have run the test on CUDA 11.3 and 10.2, and the CUFFT part passed.
Since I'm entirely unfamiliar with this code, I don't feel comfortable merging this as-is. Pinging some people who will be able to judge this better, if they have the time: @stevengj @ali-ramadhan.
Codecov Report
Merging #912 (3ddd9a6) into master (35e2186) will increase coverage by
0.01%. The diff coverage is95.19%.
@@ Coverage Diff @@
## master #912 +/- ##
==========================================
+ Coverage 76.85% 76.87% +0.01%
==========================================
Files 121 121
Lines 7709 7681 -28
==========================================
- Hits 5925 5905 -20
+ Misses 1784 1776 -8
| Impacted Files | Coverage Δ | |
|---|---|---|
| lib/cufft/fft.jl | 86.97% <95.19%> (+0.18%) |
:arrow_up: |
| src/array.jl | 86.27% <0.00%> (+0.72%) |
:arrow_up: |
| lib/cusolver/CUSOLVER.jl | 83.33% <0.00%> (+4.16%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 35e2186...3ddd9a6. Read the comment docs.