CUDA.jl
CUDA.jl copied to clipboard
`CUDA.CUFFT.plan_*fft` functions do not accept keyword arguments
The plan_*fft functions in AbstractFFTs take keyword arguments, but the methods of these functions provided by CUDA.CUFFT do not. Code that passes keyword arguments to these functions, e.g. to influence FFTW planning with Arrays, does not work with CuArrays because the presence of keyword arguments prevents dispatch to the CUDA.CUFFT methods. This results in an a MethodError if FFTW is not being used. If FFTW is being used, this dispatches to an FFTW method which then throws an ArgumentError because it tries to take the CPU address of the CuArray.
Since CUDA.CUFFT ignores these keyword arguments, I think it would be sufficient to just add ; ignored_kwargs... to the function signatures or create alternate methods that include ; ignored_kwargs... and just call the non-kwarg methods.
Here is a MWE showing the problem both without and without FFTW:
julia> using CUDA, CUDA.CUFFT
julia> A=CuArray{ComplexF32}(undef, 4);
julia> plan_fft(A)
CUFFT complex forward plan for 4-element CuArray of ComplexF32
julia> plan_fft(A; flags=0)
ERROR: MethodError: no method matching plan_fft(::CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}, ::UnitRange{Int64}; flags=0)
Closest candidates are:
plan_fft(::CuArray{T, N}, ::Any) where {T<:Union{ComplexF32, ComplexF64}, N} at ~/.julia/packages/CUDA/GGwVa/lib/cufft/fft.jl:307 got unsupported keyword argument "flags"
plan_fft(::CuArray{<:Complex{<:Union{Integer, Rational}}}, ::Any) at ~/.julia/packages/CUDA/GGwVa/lib/cufft/fft.jl:276 got unsupported keyword argument "flags"
plan_fft(::CuArray{<:Real}, ::Any) at ~/.julia/packages/CUDA/GGwVa/lib/cufft/fft.jl:274 got unsupported keyword argument "flags"
...
Stacktrace:
[1] plan_fft(x::CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}; kws::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:flags,), Tuple{Int64}}})
@ AbstractFFTs ~/.julia/packages/AbstractFFTs/SFNY3/src/definitions.jl:64
[2] top-level scope
@ REPL[5]:1
[3] top-level scope
@ ~/.julia/packages/CUDA/GGwVa/src/initialization.jl:52
julia> using FFTW
julia> plan_fft(A; flags=0)
ERROR: ArgumentError: cannot take the CPU address of a CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}
Stacktrace:
[1] unsafe_convert(#unused#::Type{Ptr{ComplexF32}}, x::CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer})
@ CUDA ~/.julia/packages/CUDA/GGwVa/src/array.jl:319
[2] macro expansion
@ ~/.julia/packages/FFTW/sfy1o/src/fft.jl:592 [inlined]
[3] (FFTW.cFFTWPlan{ComplexF32, -1, false, 1})(X::CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}, Y::Vector{ComplexF32}, region::UnitRange{Int64}, flags::Int64, timelimit::Float64)
@ FFTW ~/.julia/packages/FFTW/sfy1o/src/FFTW.jl:49
[4] plan_fft(X::CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}, region::UnitRange{Int64}; flags::Int64, timelimit::Float64, num_threads::Nothing)
@ FFTW ~/.julia/packages/FFTW/sfy1o/src/fft.jl:719
[5] plan_fft(X::CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}; kws::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:flags,), Tuple{Int64}}})
@ FFTW ~/.julia/packages/FFTW/sfy1o/src/fft.jl:735
[6] top-level scope
@ REPL[7]:1
[7] top-level scope
@ ~/.julia/packages/CUDA/GGwVa/src/initialization.jl:52
FWIW, I was able to work around this issue (for my usage of plan_brfft) by defining this method:
AbstractFFTs.plan_brfft(A::CuArray, d::Integer, region; kwargs...) = plan_brfft(A, d, region)
CUFFT doesn't support planner flags or time limits, so it's probably better to assert that the user doesn't set these flags.
These planning methods in CUDA.CUFFT are implementations of the planning functions defined in AbstractFFTs. All the planning functions in AbstractFFTs take optional keyword arguments. I think it would be better to accept and ignore keyword arguments to maximize compatibility with AbstractFFTs. The CUDA.CUFFT methods could document that they accept keyword arguments for compatibility, but that any keyword arguments are ignored because there is no use for them.
The use case for this (i.e. how I ran into this problem) is to maintain the ability to use the same code with FFTW when the input is an Array (or other FFT implementation when the input is not a CuArray) and with CUDA.CUFFT when the input is a CuArray. In my case, I am doing:
mul!(@view(output_matrix[:,i]), plan, input_vector)
For some size(output_matrix,1) values, e.g. 150, FFTW considers @view(output_matrix[:,1] to be misaligned and requires that plan has the UNALIGNED flag bit set when creating the plan. So I pass flags=FFTW.ESTIMATE|FFTW.UNALIGNED to the plan creation function, e.g. plan_brfft. But now I can no longer use this code with CuArray because the plan creation call with flags either throws MethodError or dispatches to the wrong function which then throws ArgumentError. Having the planning methods throw a different, albeit more understandable, exception is not really that much better than what happens now (i.e. code still won't work seamlessly between FFTW and CUDA.CUFFT).
If I add CUDA as a dependency to my code, I could create the plan this way:
if A isa CuArray
plan = plan_brfft(A, d)
else
plan = plan_brfft(A, d, flags=FFTW.ESTIMATE|FFTW.UNALIGNED)
end
but that goes against the grain of multiple dispatch and requires that I add CUDA as a dependency to my otherwise CUDA-unaware code (and looks ugly).
Being able to use the same code with either Arrays or CuArrays (even code that doesn't know about CUDA and CuArrays!), is one of the (many) killer features of Julia/CUDA.jl. Accepting but ignoring the keyword arguments in these methods would keep that magic alive.
Some of these flags seem semantically meaningful though, e.g., FFTW.PRESERVE_INPUT. I'm not sure it's safe to just ignore that? The others seem fine to ignore (I only had a quick look at https://www.fftw.org/fftw3_doc/Planner-Flags.html).
The cuFFT docs say that a complex-to-real out-of-place transform always overwrites its input buffer, but it doesn't mention any other case when that happens. This corresponds to plan_irfft and plan_brfft. I think there are three ways to handle this:
-
Always ignore the flags and document the
plan_irfft(::CuArray, ...)andplan_brfft(::CuArray)methods to clearly indicate that thePERSERVE_INPUTflag is not honored due to limitations in the underlying cuFFT implementation. Output a (preferably one-time) warning message to that effect whenever these methods are called (regardless of flags), but this could be confusing to people who don't use that flag, so maybe make it a debug message instead. -
Add
FFTWas a dependency to CUDA and throw anArgumentErrorif these methods are called with thePRESERVE_INPUTflag set. -
Copy the value of
FFTW.PERSERVE_INPUTintolib/cufft/fft.jland throw anArgumentErrorif these methods are called with thePRESERVE_INPUTflag set.
I think those are in order of preference. Option 3 is bad practice and should be ignored, I only included it for completeness. Option 2 is not so clean either, but wouldn't be horrible. Option 1 seems better than option 0 (i.e. the status quo) IMHO.
BTW, I just noticed that plan_irfft seems to call plan_brfft, so any warnings/exceptions would only have to happen in plan_brfft.
Relevant cuFFT docs here: https://docs.nvidia.com/cuda/cufft/index.html#data-layout
TBH, it seems a bit not-so-abstract for AbstractFFTs to specify (e.g. in the docs for plan_fft) that the flags keyword is a bitwise-or of FFTW flags. That kind of forces everyone following the AbstractFFTs interface to also follow the non-abstract FFTW flags conventions. It seems like AbstractFFTs should be the one to add FFTW as a dependency and then re-export these flags (or better, define their own versions of these flags and have FFTW translate those flags values into FFTW flags).
The only package that seems to use PRESERVE_INPUT is LazyAlgebra.jl, to implement its destroys_input function which isn't used anywhere but in its tests. So I lean towards ignoring all kwargs so that we don't have to introduce an awkward dependency on FFTW (and maybe file an issue on AbstractFFTs about this), but I wouldn't introduce any log messages (given that there's no users of this flag).
Great, it sounds like we've converged! I opened issue https://github.com/JuliaMath/AbstractFFTs.jl/issues/71 regarding flags. I'll be happy to work on a PR for adding flags to the CUDA.CUFFT planning functions, but it may be a little while before I can get to it.