cuPCL
cuPCL copied to clipboard
Cuda filter demo, cuda-pcl is worse than pcl when I use the VoxelGrid
cuda-pcl in PassThrough is better than pcl but in VoxelGrid is not well
Your output info make me confused, your NX even slower than my jetson nano(4GB), and it should not be. The output info of my jetson nano as follows:
./demo
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X1
Capbility: 5.3
Global memory: 3956MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
------------checking CUDA ----------------
CUDA Loaded 119978 data points from PCD file with the following fields: x y z
------------checking CUDA PassThrough ----------------
CUDA PassThrough by Time: 1.9844 ms.
CUDA PassThrough before filtering: 119978
CUDA PassThrough after filtering: 5110
------------checking CUDA VoxelGrid----------------
CUDA VoxelGrid by Time: 35.325 ms.
CUDA VoxelGrid before filtering: 119978
CUDA VoxelGrid after filtering: 3440
------------checking PCL ----------------
PCL(CPU) Loaded 119978 data points from PCD file with the following fields: x y z
------------checking PCL(CPU) PassThrough ----------------
PCL(CPU) PassThrough by Time: 9.47348 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 5110 data points (x y z).
------------checking PCL VoxelGrid----------------
PCL VoxelGrid by Time: 24.2884 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 3440 data points (x y z).
And when I run the jetson clocks, it will be faster, the output info as follows:
./demo
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X1
Capbility: 5.3
Global memory: 3956MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
------------checking CUDA ----------------
CUDA Loaded 119978 data points from PCD file with the following fields: x y z
------------checking CUDA PassThrough ----------------
CUDA PassThrough by Time: 1.39955 ms.
CUDA PassThrough before filtering: 119978
CUDA PassThrough after filtering: 5110
------------checking CUDA VoxelGrid----------------
CUDA VoxelGrid by Time: 11.9661 ms.
CUDA VoxelGrid before filtering: 119978
CUDA VoxelGrid after filtering: 3440
------------checking PCL ----------------
PCL(CPU) Loaded 119978 data points from PCD file with the following fields: x y z
------------checking PCL(CPU) PassThrough ----------------
PCL(CPU) PassThrough by Time: 3.32619 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 5110 data points (x y z).
------------checking PCL VoxelGrid----------------
PCL VoxelGrid by Time: 16.5497 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 3440 data points (x y z).
Finally, I don't know why cuda-pcl in PassThrough is better than pcl but in VoxelGrid is not well, but I think maybe that's why pcl remove the cuda support of voxelgrid in pcl-1.13.1.
@MagicalBrain hello,I want to ask for advice.
Running machine environment:
When I use the official cuFilter demo, the cuda calculation time is basically the same as the official one. As follows: ------------checking CUDA VoxelGrid---------------- CUDA VoxelGrid by Time: 3.20768 ms. CUDA VoxelGrid before filtering: 119978 CUDA VoxelGrid after filtering: 3440
But when I try to set setP.voxelX, setP.voxelY, and setP.voxelZ to 0.09, the cuda calculation time is much slower, which is not as expected. As follows: ------------checking CUDA VoxelGrid---------------- CUDA VoxelGrid by Time: 3109.65 ms. CUDA VoxelGrid before filtering: 119978 CUDA VoxelGrid after filtering: 62844
Why is this? Is there any way to solve this situation? In most cases, setP.voxelX, setP.voxelY, and setP.voxelZ cannot always be set to 1. I hope someone can help.