pcl
pcl copied to clipboard
[registration] Add OMP based Multi-threading to IterativeClosestPoint
This PR adds multi-threading capability to pcl::IterativeClosestPoint and pcl::CorrespondenceEstimation. pcl::ICP spends about 95% of time for correspondence estimation, and thus it can be largely sped up by making CorrespondenceEstimation multi-threaded. Some classes that inherit from pcl::ICP can also be sped up with this PR (e.g., pcl::IterativeClosestPointNonLinear).
Note: The sort at the end of CorrespondenceEstimation::determineCorrespondences() is necessary to pass some tests because they assume correspondences are ordered. I also tried omp ordered to keep correspondences ordered, but it was very slow.
Off-topic: Should we make a common interface for setNumberOfThreads()?
Some benchmark results:
# aligning 16k points
* with omp
threads: time
1:153.565[msec]
2:87.3185[msec]
3:67.8443[msec]
4:54.0599[msec]
5:45.6129[msec]
6:40.9252[msec]
7:43.6004[msec]
8:40.2708[msec]
9:46.0424[msec]
* without omp
1:149.439[msec]
2:150.391[msec]
3:152.18[msec]
4:153.475[msec]
5:149.943[msec]
6:149.483[msec]
7:148.78[msec]
8:149.607[msec]
9:149.231[msec]
* without sort
1:149.21[msec]
2:84.0002[msec]
3:60.262[msec]
4:47.0388[msec]
5:39.0977[msec]
6:36.8624[msec]
7:36.5644[msec]
8:32.6782[msec]
9:39.3133[msec]
* use omp ordered instead of sort
1:209.918[msec]
2:196.892[msec]
3:197.966[msec]
4:196.945[msec]
5:197.686[msec]
6:200.591[msec]
7:200.022[msec]
8:208.916[msec]
9:223.545[msec]
* with omp and use_reciprocal_correspondence_ == true
1:331.359[msec]
2:180.315[msec]
3:130.543[msec]
4:106.585[msec]
5:91.2735[msec]
6:88.4035[msec]
7:94.627[msec]
8:99.4426[msec]
9:97.6912[msec]
* without omp and use_reciprocal_correspondence_ == true
1:329.28[msec]
2:324.217[msec]
3:322.527[msec]
4:322.287[msec]
5:322.802[msec]
6:326.154[msec]
7:329.858[msec]
8:333.565[msec]
9:336.588[msec]
Off-topic: Should we make a common interface for setNumberOfThreads()?
When @shrijitsingh99 is able to integrate the executor and test, we would be one step closer. He mentioned he'd have cycles free near the winter holidays But yes, in principle I concur with you on this
Build errors in tutorials:
error: there are no arguments to 'omp_get_thread_num' that depend on a template parameter, so a declaration of 'omp_get_thread_num' must be available [-fpermissive]
per_thread_correspondences[omp_get_thread_num()].emplace_back(std::move(corr));
Is it a missing include or?
Build errors in tutorials:
I wrapped omp_get_thread_num in #ifdef _OPENMP so it doesn't cause errors on omp-disabled environments.
This seems useful. Any chance to get this moved forward?
This seems useful. Any chance to get this moved forward?
I will take another look
I am happy with the changes now. I will merge this after PCL 1.14.1 is released because adding the num_threads_ variable breaks the ABI, and we don't want an ABI break between 1.14.0 and 1.14.1