KDEpy icon indicating copy to clipboard operation
KDEpy copied to clipboard

GPU acceleration with CuPy

Open thomasaarholt opened this issue 3 years ago • 1 comments

Just leaving a note since its nice to share interest. My use-case is performing KDE many times on large (10 million) sets of data points.

I thought I would have a play and see if it would be low-hanging fruit to add support for CuPy to KDEpy. CuPy is a GPU library with NumPy-like syntax. One of the really nice features is that (when supported), NumPy functions will automatically use the CuPy equivalent when applied to a CuPy array. GPUs are ridiculously fast at calculating FFTs compared to NumPy, so I thought it might be nice to take the speedup provided by KDEpy even further. From what I can tell from the docstring of FFTKDE, the FFT (and not the linear binning) is the bottleneck.

After playing around a bit, I realised that the cutils code is a hard dependency, but also that you've written a (slower) numpy function. I'm a bit surprised that there don't exist faster numpy (and hence CuPy) binning implementations - perhaps this would be an idea to look out for? Do I understand correctly that the binning algorithm you're using is bilinear binning?

(btw, I kept getting gcc: error: KDEpy/cutils.c: No such file or directory (full error here) when trying to developer install it with pip on Ubuntu)

Just thought I'd bring the topic up, as I thought your library was cool! :)

thomasaarholt avatar Jun 28 '21 09:06 thomasaarholt

Hi. Thanks for letting me know about CuPy, and I'm very happy that you like KDEpy. Some thoughts:

I thought I would have a play and see if it would be low-hanging fruit to add support for CuPy to KDEpy.

I'm not necessarily against it, but another integration/implementation might require support and debugging years into the future. I have relatively little time for continued work/support. Many people have made good suggestions for additional features to implement in KDEpy, but if it's (1) a rare use-case or (2) there is a chance that I will end up maintaining less-than-ideal code written by others, I usually reject it. It's not right for me to include it if I don't have the time to maintain it. I'd rather have KDEpy do a select few things really well.

From what I can tell from the docstring of FFTKDE, the FFT (and not the linear binning) is the bottleneck.

That's true in theory, but in practice we should probably measure it. :)

I'm a bit surprised that there don't exist faster numpy (and hence CuPy) binning implementations - perhaps this would be an idea to look out for? Do I understand correctly that the binning algorithm you're using is bilinear binning?

The binning is basically bilinear interpolation as explained by Wikipedia, generalized to arbitrary dimensions. To my knowledge NumPy/SciPy does not implement it. There are binning algorithms, but they are more general (and slower) since they allow arbitrary grids instead of only equidistant grids.

In summary: I would merge a pull request if there's a real use-case here (great speedup on huge data sets) and if the code is well-written and complete.


Installation error

The following works on my computer.

$ conda create -n KDEpytest python=3.9 anaconda --yes
$ conda activate KDEpytest
$ git clone https://github.com/tommyod/KDEpy
$ pip install -e . 
Successfully installed KDEpy

Others have had similar trouble, please see this issue.

tommyod avatar Jun 28 '21 14:06 tommyod