pcl [gpu_extract_clusters] gpu euclidean cluster crashes easily

Describe the bug

I'm tring to use gpu_extract_clusters for better performence, but several problems occured.

#include <pcl/gpu/segmentation/gpu_extract_clusters.h> happens redefinition, I have to copy the file into my project to fix this.
gpu euclidean cluster crashes easily
Error: invalid argument /home/walker-ubuntu/Downloads/env/pcl-1.11.1/gpu/containers/src/device_memory.cpp:173

Context

    pcl::gpu::EuclideanClusterExtraction gec;

    gec.setClusterTolerance(config->getNumber("euclidean_cluster_distance"));
    gec.setMinClusterSize((int)config->getNumber("euclidean_cluster_min_size"));
    gec.setMaxClusterSize((int)config->getNumber("euclidean_cluster_max_size"));

    gec.setSearchMethod(oc_tree);
    gec.setHostCloud(cloud_in);

    oc_cloud.upload(cloud_in->points);
    oc_tree->setCloud(oc_cloud);
    oc_tree->build();

    gec.extract(inlier);

Expected behavior

runs the same as normal euclidean cluster

Current Behavior

runable, but always crashes while running

Your Environment (please complete the following information):

OS: Ubuntu 20.04
Compiler: GCC 9.3.0
PCL Version 1.11.1 (gpu and cuda installed)
CUDA 11.2
RTX 3070

added 8.6 to CUDA_ARCH_BIN while compiling pcl

Aug 22 '21 17:08 stevalkr

Can you try again with the current PCL master? There have been a few changes to those classes since PCL 1.11.1

Aug 23 '21 08:08 mvieth

I have upgraded to the latest commit, and the problem still exists
Error: invalid argument /home/walker-ubuntu/Downloads/env/pcl/gpu/containers/src/device_memory.cpp:281 ,

pcl/gpu/containers/src/device_memory.cpp:

void
pcl::gpu::DeviceMemory::upload(const void* host_ptr_arg, std::size_t sizeBytes_arg)
{
  create(sizeBytes_arg);
/*281*/ cudaSafeCall(cudaMemcpy(data_, host_ptr_arg, sizeBytes_, cudaMemcpyHostToDevice));
  cudaSafeCall(cudaDeviceSynchronize());
}

Sorry that I'm still in college and not even good at English, can't give you more details about the issue.
Here is my code:

    cloud_2d->clear();
    for (auto point : cloud_in->points) // cloud_in: pcl::PointCloud<pcl::PointXYZ>::Ptr
    {
        point.z = 0;
        cloud_2d->push_back(point);
    }

    pcl::gpu::Octree::PointCloud cloud_device;
    cloud_device.upload(cloud_2d->points);

    pcl::gpu::Octree::Ptr octree_device(new pcl::gpu::Octree);
    octree_device->setCloud(cloud_device);
    octree_device->build();

    std::vector<pcl::PointIndices> cluster_indices_gpu;
    pcl::gpu::EuclideanClusterExtraction gecc;
    gecc.setClusterTolerance(config->getNumber("euclidean_cluster_distance")); // 0.01
    gecc.setMinClusterSize(5);
    gecc.setMaxClusterSize(500);
    gecc.setSearchMethod(octree_device);
    gecc.setHostCloud(cloud_2d);
    gecc.extract(cluster_indices_gpu);

cloud_in should have no problems for common euclidean cluster works fine

Aug 23 '21 12:08 stevalkr

@FabianSchuetze If you have time, would you have a look at this? Given your experience with the gpu clustering and the gpu containers, maybe you have an idea

Aug 23 '21 16:08 mvieth

@Zero-Swangel can you create a point cloud where it happens or share one you already have? Then I can try run it as well.

Aug 24 '21 16:08 larshg

@Zero-Swangel can you create a point cloud where it happens or share one you already have? Then I can try run it as well.

record_.zip

        static int i = 0;
        i++;
        std::cout << "PCD " << i << std::endl;
        std::string path = "/home/ubuntu/Downloads/pcd/record_"+std::to_string(i)+".pcd";
        // pcl::io::savePCDFileASCII(path, *data.cloud_data.cloud);

        PointCloud::Ptr cloud__(new PointCloud);
        pcl::io::loadPCDFile<pcl::PointXYZ>(path, *cloud__);

#ifdef GPU_CLUSTER
        if (config->getString("if_use_gpu") == "true")
            cluster->GPU_euclideanCluster(/*data.cloud_data.cloud*/cloud__, box_array);
        else
#endif
            cluster->euclideanCluster(data.cloud_data.cloud, box_array);

It crashes in the last frame and it can be reproduced in my program. However, I failed writting a simple demo to reproduce the issue, still working on it, maybe ros related

Aug 25 '21 08:08 stevalkr

I can verify it crashes.

It allocates spaces for max_answers which is equal to max_pts_per_cluster: https://github.com/PointCloudLibrary/pcl/blob/85fc1707dd28f4104a7cf6ab0f8570d11ec1b4c4/gpu/segmentation/include/pcl/gpu/segmentation/impl/gpu_extract_clusters.hpp#L82-L83

However, the number of queries can be beyond this count and therefor it tries to upload more points, than there is space allocated. https://github.com/PointCloudLibrary/pcl/blob/85fc1707dd28f4104a7cf6ab0f8570d11ec1b4c4/gpu/segmentation/include/pcl/gpu/segmentation/impl/gpu_extract_clusters.hpp#L136-L144

I'm not sure if the max_answers is correct in this part of the code. Since, if the query points do belong to the same cluster, this clusters should be discarded, due to exeeding the max_pts_per_cluster, but still be found as one big cluster.

I solved the crashing by adding:

        const int queriesHostCount = queries_host.size();
        if (queriesHostCount > queries_device_buffer.size())
          queries_device_buffer.create(queriesHostCount);

After this line: https://github.com/PointCloudLibrary/pcl/blob/85fc1707dd28f4104a7cf6ab0f8570d11ec1b4c4/gpu/segmentation/include/pcl/gpu/segmentation/impl/gpu_extract_clusters.hpp#L134

But it makes it return wrong number of clusters, compared to the KDtree/Octree-CPU clustering methods.

Also, I don't think you'll see any speedup, due to the low number of points there are in your pointclouds.

Aug 25 '21 20:08 larshg

I can verify it crashes.

It allocates spaces for max_answers which is equal to max_pts_per_cluster: https://github.com/PointCloudLibrary/pcl/blob/85fc1707dd28f4104a7cf6ab0f8570d11ec1b4c4/gpu/segmentation/include/pcl/gpu/segmentation/impl/gpu_extract_clusters.hpp#L82-L83

However, the number of queries can be beyond this count and therefor it tries to upload more points, than there is space allocated. https://github.com/PointCloudLibrary/pcl/blob/85fc1707dd28f4104a7cf6ab0f8570d11ec1b4c4/gpu/segmentation/include/pcl/gpu/segmentation/impl/gpu_extract_clusters.hpp#L136-L144

I'm not sure if the max_answers is correct in this part of the code. Since, if the query points do belong to the same cluster, this clusters should be discarded, due to exeeding the max_pts_per_cluster, but still be found as one big cluster.

I solved the crashing by adding:
        const int queriesHostCount = queries_host.size();
        if (queriesHostCount > queries_device_buffer.size())
          queries_device_buffer.create(queriesHostCount);
After this line: https://github.com/PointCloudLibrary/pcl/blob/85fc1707dd28f4104a7cf6ab0f8570d11ec1b4c4/gpu/segmentation/include/pcl/gpu/segmentation/impl/gpu_extract_clusters.hpp#L134

But it makes it return wrong number of clusters, compared to the KDtree/Octree-CPU clustering methods.

Also, I don't think you'll see any speedup, due to the low number of points there are in your pointclouds.

Thank you for your reply, about speedup, it can reduce the time cost from 40-80ms to 20-60ms(if works properly), not significant but not to be ignored. I don't know if cpu cluster is running too slow or not

Aug 26 '21 05:08 stevalkr

Thank you for your reply, about speedup, it can reduce the time cost from 40-80ms to 20-60ms(if works properly), not significant but not to be ignored. I don't know if cpu cluster is running too slow or not

Ah, nice. It depends a lot on hardware I suppose. Hopefully @FabianSchuetze can figure out a solution. I'm a bit out of spare time atm 😢

Aug 26 '21 06:08 larshg

Has anybody been able to make any progress on this issue?

Jan 04 '22 21:01 JaccoKiezebrink

Has anybody been able to make any progress on this issue?

Don't think so, so please go ahead 😀

Jan 04 '22 22:01 larshg