cuda-samples icon indicating copy to clipboard operation
cuda-samples copied to clipboard

cdpQuadtree does not work correctly on a large number of points

Open pwrliang opened this issue 2 months ago • 3 comments

I've found that the quadtree example doesn't work correctly for a large number of points, such as 100K. It fails the "check_quadtree" test. I tried to locate the problem with compute-sanitizer, which reported illegal memory access. Without the sanitizer, the kernel doesn't raise an error; it just produces incorrect results. By default, the tree depth is 8. With a leaf that can hold 16 points, it can hold 4^8 ∗16, which is about 1M points, but the sample code doesn't work as expected. Can you help me solve this issue?

========= Invalid __global__ write of size 4 bytes
=========     at void build_quadtree_kernel<(int)128>(Quadtree_node *, Points *, Parameters)+0x2280 in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/cdpQuadtree.cu:542
=========     by thread (127,0,0) in block (0,0,0)
=========     Address 0x7fe9f6a12170 is out of bounds
=========     and is 73,585 bytes after the nearest allocation at 0x7fe9f6a00000 of size 512 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x3812c8]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:libcudart_static_4d8b33a106dceb3c07a56e26de61f2d53bb62a68 [0x1fcde]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:cudaLaunchKernel [0x8292e]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) in /usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:216 [0xfaaf]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:__device_stub__Z21build_quadtree_kernelILi128EEvP13Quadtree_nodeP6Points10Parameters(Quadtree_node*, Points*, Parameters&) in /tmp/tmpxft_00188328_00000000-6_cdpQuadtree.compute_86.cudafe1.stub.c:23 [0xf109]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:void __wrapper__device_stub_build_quadtree_kernel<128>(Quadtree_node*&, Points*&, Parameters&) in /tmp/tmpxft_00188328_00000000-6_cdpQuadtree.compute_86.cudafe1.stub.c:24 [0xf166]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:void build_quadtree_kernel<128>(Quadtree_node*, Points*, Parameters) in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/cdpQuadtree.cu:545 [0x12f86]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:cdpQuadtree(int) in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/cdpQuadtree.cu:680 [0xea87]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:main in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/cdpQuadtree.cu:729 [0xef3a]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========     Host Frame:__libc_start_main in ../csu/libc-start.c:342 [0x24083]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0xdebe]
=========                in /home/geng.161/cuda-samples/Samples/3_CUDA_Features/cdpQuadtree/debug/./cdpQuadtree
=========

pwrliang avatar Oct 12 '25 01:10 pwrliang

Hi, Can I handle this issue now?

marktech0813 avatar Nov 05 '25 09:11 marktech0813

children[child_offset + 0].set_range(node.points_begin(), s_num_pts[0][warp_id]); children[child_offset + 1].set_range(s_num_pts[0][warp_id], s_num_pts[1][warp_id]); children[child_offset + 2].set_range(s_num_pts[1][warp_id], s_num_pts[2][warp_id]); children[child_offset + 3].set_range(s_num_pts[2][warp_id], s_num_pts[3][warp_id]);

hi, I find the sample code has something wrong with this 4 lines,maybe it should be

children[child_offset + 0].set_range(node.points_begin(), s_num_pts[1][0]); children[child_offset + 1].set_range(s_num_pts[1][0], s_num_pts[2][0]); children[child_offset + 2].set_range(s_num_pts[2][0], s_num_pts[3][0]); children[child_offset + 3].set_range(s_num_pts[3][0], node.points_end());

JIARONGLI666 avatar Nov 26 '25 08:11 JIARONGLI666

and try to increase min_points_per_node from 16 to a large num, such as 512...,it should be done

JIARONGLI666 avatar Dec 01 '25 08:12 JIARONGLI666