pcl icon indicating copy to clipboard operation
pcl copied to clipboard

[custom] Possibilitiy of adding fast KDtree (or other clustering algorithm) to the GPU module?

Open FabianSchuetze opened this issue 3 years ago • 5 comments

In thread #4677, @larshg mentioned that the revised GPU clustering runs faster than before but is slower than the CPU version. The CPU version is based on a KD-Tree, while the GPU version relies on an Octree. I was looking for fast GPU implementation of KD-Tree but did not found a convincing one. Moreover, I was not sure if something like this exists. I thus wanted to ask if somebody with more experience with the clustering algorithms knows whether fast GPU implementations exist and whether we could leverage them here? Are maybe the nn implementations in the cuda module just that? I am grateful for any tips or suggestions!

FabianSchuetze avatar Jun 27 '21 10:06 FabianSchuetze

I recently found http://ann-benchmarks.com, which however mainly tests CPU implementations, but the readme on Github hints toward https://github.com/facebookresearch/faiss, which has a GPU implementation. I haven't looked into that at all so no idea if that is useful for us, but it may be a starting point for you if you are interested in this.

mvieth avatar Jun 27 '21 15:06 mvieth

Ha! Fantastic, thank you, Markus! I was hoping to get exactly such an answer and I will definitely look into faiss.

FabianSchuetze avatar Jun 27 '21 17:06 FabianSchuetze

I have research this topic more. To my despair faiss does not support a radiusSearch on the gpu, only on the cpu. To quote:

[...]and range search is not currently implemented on the GPU.

It is on my long-term roadmap to allow for k-selection for arbitrary k on the GPU, but this will take a while and isn't something I can promise anytime soon. Range search would deal with similar issues, though this one is easier.

I begin to wonder if GPUs are not suitable for such searches. The faiss GPU module has a search for the k-nearest neigbors, but I am not sure if this could be of any help for us?

FabianSchuetze avatar Jul 10 '21 18:07 FabianSchuetze

Hi,

Have you seen clustering from autoware? It is faster than PCL CPU/GPU clustering with the same result. https://github.com/Autoware-AI/core_perception/blob/master/lidar_euclidean_cluster_detect/nodes/lidar_euclidean_cluster_detect/gpu_euclidean_clustering.cu

I wrote a small benchmark: https://github.com/BaltashovIlia/pcl_vs_autoware_clustering

On Ryzen 2700x + RTX A4000, the result is as follows:

bunny.pcd (397) rops_cloud.pcd (32087) sdc_filtered.pcd (41916) sdc_raw.pcd (200499)
pcl_cpu 1.67 10595 170 1313
pcl_gpu 3.20 4176 639 3224
autoware 1.24 12.3 16.4 175

BaltashovIlia avatar Sep 13 '21 13:09 BaltashovIlia

#5299

yasamoka avatar Jun 18 '22 01:06 yasamoka