sdrangel icon indicating copy to clipboard operation
sdrangel copied to clipboard

CUDA Accelerated FFT

Open f4exb opened this issue 2 years ago • 10 comments

Is your feature request related to a problem? Please describe. Large FFT with large overlap is too slow

Describe the solution you'd like I would like the FFT to run on my NVIDIA GPU and just for the sake of it!

Describe alternatives you've considered FFTW sucks!

Additional context For wider GPU support this could be considered: https://github.com/DTolm/VkFFT

f4exb avatar Feb 25 '22 16:02 f4exb

I have tried a GitHub project called radio-core which uses cuda acceleration for Broadcast FM demodulation, it seems to make a significant impact in reducing system load.

Jcwscience avatar Feb 25 '22 16:02 Jcwscience

Also I’m using a hackrf so there is a lot of data being processed, from what I understand the hackrf internal filters only work properly with a sample rate of over 10mhz (although I might have misunderstood)

Jcwscience avatar Feb 25 '22 16:02 Jcwscience

But if I’m being honest my main motivation probably is “well the sdr software I use has gpu acceleration, look at my cool setup with all of this compute power!”.

Jcwscience avatar Feb 25 '22 16:02 Jcwscience

That's what I meant: GPU acceleration just for the sake of it....

I am alright to break SDRangel for this. I don´t care I have a NVIDIA graphics card which btw I think is superior to other graphic cards.

f4exb avatar Feb 25 '22 20:02 f4exb

@f4exb actually I was just looking at vkfft, I hadn’t seen it before, if it doesn’t have any breaking bugs then maybe it could be useful for people with amd cards as well? Also I hope I’m not asking too many questions but other than the waterfall is there anything else that uses FFTW? I’m fairly new to this field entirely but I’m enjoying studying the code to see how things work.

Jcwscience avatar Feb 25 '22 21:02 Jcwscience

There are a couple of other plugins that use FFTs as well, but all the code does this indirectly via the FFTEngine and FFTFactory classes.

So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines.

See sdrbase/dsp/fftwengine.h kissengine.h (which is an alternative to FFTW) and FFTFactory.cpp - It looks like it should be fairly straightforward to drop a different implementation in there.

While I doubt it will be of much benefit to existing plugins - high performance FFT and IFFT could be useful for OFDM modems in the future.

srcejon avatar Feb 25 '22 21:02 srcejon

@srcejon Awesome, I saw reference to an alternative fft engine in the source files, but I didn’t quite know where it fit into the rest of things. I went ahead and forked the repo and I’m running some benchmarks on VkFFT now as well.

Jcwscience avatar Feb 25 '22 21:02 Jcwscience

So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines.

The "switch" is in https://github.com/f4exb/sdrangel/blob/master/sdrbase/dsp/fftengine.cpp and based on global defines set in the CMakeLists.txt in sdrbase: https://github.com/f4exb/sdrangel/blob/master/sdrbase/CMakeLists.txt#L9 For now it checks if libfftw3fis available which bases the choice between FFTW (-DUSE_FFTW) or an internal KISS FFT (-DUSE_KISSFFT).

I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi.

f4exb avatar Feb 26 '22 06:02 f4exb

I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi.

Ideally I would have thought it should be a runtime decision rather than at compilation time, so that binary releases can use Vulkan etc if they are available, but can still fallback to FFTW if not. We don't really want to do multiple builds. That's assuming the list of new dependencies isn't problematic.

srcejon avatar Feb 26 '22 19:02 srcejon

Info taken from latest KrakenSDR update

“ investigation into the possibility of using the GPU on the Pi 4 to compute the FFTs required in our algorithms faster via Vulkan and VkFFT. Long story short, for larger FFTs it seems that the Pi 4 GPU is capable of about a 2x speedup. However, an issue is that the Pi 4 Vulkan implementation is very new, and in it’s current state is missing an important feature relating to memory transfer. Without this feature, there is a need to perform unnecessary memory transfers and this brings us back to a 1x speedup. But we have considered that even without any speedup, using the GPU essentially provides us with another computational core which may still be of use as it frees up the CPU cores for other tasks.”

alphafox02 avatar Mar 04 '22 20:03 alphafox02

Has anyone cracked this nut?

savagesmc avatar Aug 06 '23 02:08 savagesmc

Have just tried adding the CUDA version of VkFFT - and at the moment, it looks much slower than FFTW. Could be because I've done something wrong - but probably because we're just performing a single FFT serially, and there's too much overhead in getting it in and out of the GPU.

srcejon avatar Aug 07 '23 11:08 srcejon

Released in v7.15.3

f4exb avatar Aug 20 '23 21:08 f4exb