pcl k-d tree speedup (nanoflann / CUDA)

This pull request provides tested kd-tree implementations using Nanoflann (CPU) and FLANN (CUDA) as well as the addition of the ability to set the max leaf size for any kd-tree implementation.

Benchmarks comparing FLANN (CPU), Nanoflann (CPU), and FLANN (CUDA) can be found here: https://yasamoka.github.io/pcl-knn-benchmark/

I am not sure if there is a better way of modifying CMake scripts to satisfy dependencies. If there is, then I would appreciate help with that.

Regarding documentation, I placed the FLANN CUDA implementation with the kdtree module. This has good visibility for users of k-d trees. Shall I move it to its own module (e.g. cuda/kdtree)? Is it possible to have 2 levels like that?

Thank you very much!

Jun 17 '22 22:06 yasamoka

This plot seems to show that nanoflann is slower than FLANN, but your bar graphs further down show the opposite 🤔

Jan 12 '23 12:01 themightyoarfish

This plot seems to show that nanoflann is slower than FLANN, but your bar graphs further down show the opposite thinking

The line graph you're seeing is tree build time.

The bar graphs you see below that are NN search time.

Yes, nanoflann is slower than FLANN in tree building for the same leaf size. It is faster than FLANN for NN search the more you head towards less threads / less # search points.

Jan 12 '23 12:01 yasamoka

This seems useful, might a maintainer take a look?

Feb 06 '23 08:02 themightyoarfish

@mvieth @larshg Could you give some feedback here?

Nov 02 '23 18:11 themightyoarfish

how about range search? is nanoflann quicker than flann?

Nov 03 '23 10:11 xiaodong2077