pcl icon indicating copy to clipboard operation
pcl copied to clipboard

Add OpenMP Parallelization to FarthestPointSampling

Open alexnavtt opened this issue 6 months ago • 5 comments

FarthestPointSampling is a relatively recent addition to the filters module and has not yet been upgraded with a parallel implementation. The inner loop which recalculates the maximum minimum distance involves a large number of (mostly) independent operations, making it a good candidate for parallelization. The outer loop is serial due to the nature of the filter and so cannot be parallelized.

This PR adds OpenMP-based parallelization to that inner loop, and introduces an API to allow the user to select the number of threads, with the default being the maximum CPU thread count. I also had another lock-free version where each thread would independently calculate it's own maximum and then all the results would be reduced at the end, but that performed about the same but had more complicated (i.e. bug-prone) control flow.

The following code was used to benchmark performance:

#include <chrono>
#include <iostream>
#include <pcl/conversions.h>
#include <pcl/io/vtk_lib_io.h>
#include <pcl/filters/farthest_point_sampling.h>

int main() {
    pcl::PolygonMesh mesh;
    pcl::io::loadPolygonFile("/home/alex/Downloads/Espeon.stl", mesh);

    auto vertices = pcl::PointCloud<pcl::PointXYZ>{}.makeShared();
    pcl::fromPCLPointCloud2(mesh.cloud, *vertices);
    std::cout << "Loaded pointcloud with " << vertices->size() << " points\n";

    pcl::FarthestPointSampling<pcl::PointXYZ> farthest_point_sampler;
    farthest_point_sampler.setInputCloud(vertices);
    
    for (unsigned int nr_threads : {1, 2, 5, 10, 0}) {
        farthest_point_sampler.setNumberOfThreads(nr_threads);
        std::cout << farthest_point_sampler.getNumberOfThreads() << " thread" << (farthest_point_sampler.getNumberOfThreads() > 1 ? "s" : "") << ":\n";
        for (std::size_t sample_count = 10; sample_count <= 10000; sample_count *= 10) {
            farthest_point_sampler.setSample(sample_count);
            std::size_t total_milliseconds = 0;
            for (std::size_t trial_idx = 0; trial_idx < 10; trial_idx++) {
                pcl::Indices indices;
                farthest_point_sampler.setSeed(0);
                auto start = std::chrono::steady_clock::now();
                farthest_point_sampler.filter(indices);
                auto stop = std::chrono::steady_clock::now();
                total_milliseconds += std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count();
            }

            std::cout << std::setw(8) << sample_count << ": " << total_milliseconds/100 << " ms\n"; 
        }
    }
}

which gave this output, where the number on the left is the number of samples drawn at each trial, and the number on the right is the average execution time over 10 trials:

Loaded pointcloud with 638087 points
1 thread:
      10: 2 ms
     100: 23 ms
    1000: 246 ms
   10000: 2133 ms
2 threads:
      10: 1 ms
     100: 11 ms
    1000: 111 ms
   10000: 1124 ms
5 threads:
      10: 0 ms
     100: 5 ms
    1000: 58 ms
   10000: 598 ms
10 threads:
      10: 0 ms
     100: 3 ms
    1000: 35 ms
   10000: 353 ms
20 threads:
      10: 0 ms
     100: 2 ms
    1000: 28 ms
   10000: 337 ms

alexnavtt avatar Jun 03 '25 04:06 alexnavtt

Alright, I think I've addressed everything

alexnavtt avatar Jun 04 '25 04:06 alexnavtt

Hi, bumping on this in case there's anything else that needs to be done before the merge

alexnavtt avatar Jun 09 '25 13:06 alexnavtt

Hi, bumping on this in case there's anything else that needs to be done before the merge

I am happy with the changes, not sure if @larshg has any further requests. However, I should mention that we cannot immediately merge this pull request because by adding the nr_threads_ field, the ABI of the FarthestPointSampling class changes, and we are aiming to keep the next PCL release (1.15.1) ABI and API compatible to the last PCL release (1.15.0). So earliest we can merge this PR is directly after the 1.15.1 release, which is probably in a few weeks (exact date is not fixed yet).

mvieth avatar Jun 10 '25 08:06 mvieth

Alright sounds good. Thanks for the update

alexnavtt avatar Jun 10 '25 13:06 alexnavtt

Maybe also change nr_threads_ to num_threads_ (used in most other classes)

larshg avatar Jun 11 '25 08:06 larshg