pcl icon indicating copy to clipboard operation
pcl copied to clipboard

[registration] Add OMP based Multi-threading to IterativeClosestPoint

Open koide3 opened this issue 4 years ago • 6 comments

This PR adds multi-threading capability to pcl::IterativeClosestPoint and pcl::CorrespondenceEstimation. pcl::ICP spends about 95% of time for correspondence estimation, and thus it can be largely sped up by making CorrespondenceEstimation multi-threaded. Some classes that inherit from pcl::ICP can also be sped up with this PR (e.g., pcl::IterativeClosestPointNonLinear).

Note: The sort at the end of CorrespondenceEstimation::determineCorrespondences() is necessary to pass some tests because they assume correspondences are ordered. I also tried omp ordered to keep correspondences ordered, but it was very slow.

Off-topic: Should we make a common interface for setNumberOfThreads()?

Some benchmark results:

# aligning 16k points

* with omp
threads: time
1:153.565[msec]
2:87.3185[msec]
3:67.8443[msec]
4:54.0599[msec]
5:45.6129[msec]
6:40.9252[msec]
7:43.6004[msec]
8:40.2708[msec]
9:46.0424[msec]

* without omp
1:149.439[msec]
2:150.391[msec]
3:152.18[msec]
4:153.475[msec]
5:149.943[msec]
6:149.483[msec]
7:148.78[msec]
8:149.607[msec]
9:149.231[msec]

* without sort
1:149.21[msec]
2:84.0002[msec]
3:60.262[msec]
4:47.0388[msec]
5:39.0977[msec]
6:36.8624[msec]
7:36.5644[msec]
8:32.6782[msec]
9:39.3133[msec]

* use omp ordered instead of sort
1:209.918[msec]
2:196.892[msec]
3:197.966[msec]
4:196.945[msec]
5:197.686[msec]
6:200.591[msec]
7:200.022[msec]
8:208.916[msec]
9:223.545[msec]


* with omp and use_reciprocal_correspondence_ == true
1:331.359[msec]
2:180.315[msec]
3:130.543[msec]
4:106.585[msec]
5:91.2735[msec]
6:88.4035[msec]
7:94.627[msec]
8:99.4426[msec]
9:97.6912[msec]

* without omp and use_reciprocal_correspondence_ == true
1:329.28[msec]
2:324.217[msec]
3:322.527[msec]
4:322.287[msec]
5:322.802[msec]
6:326.154[msec]
7:329.858[msec]
8:333.565[msec]
9:336.588[msec]

koide3 avatar Nov 18 '20 10:11 koide3

Off-topic: Should we make a common interface for setNumberOfThreads()?

When @shrijitsingh99 is able to integrate the executor and test, we would be one step closer. He mentioned he'd have cycles free near the winter holidays But yes, in principle I concur with you on this

kunaltyagi avatar Nov 19 '20 08:11 kunaltyagi

Build errors in tutorials:

 error: there are no arguments to 'omp_get_thread_num' that depend on a template parameter, so a declaration of 'omp_get_thread_num' must be available [-fpermissive]
       per_thread_correspondences[omp_get_thread_num()].emplace_back(std::move(corr));

Is it a missing include or?

larshg avatar Dec 20 '20 21:12 larshg

Build errors in tutorials:

I wrapped omp_get_thread_num in #ifdef _OPENMP so it doesn't cause errors on omp-disabled environments.

koide3 avatar Dec 22 '20 05:12 koide3

This seems useful. Any chance to get this moved forward?

keineahnung2345 avatar Jan 02 '24 06:01 keineahnung2345

This seems useful. Any chance to get this moved forward?

I will take another look

mvieth avatar Jan 05 '24 13:01 mvieth

I am happy with the changes now. I will merge this after PCL 1.14.1 is released because adding the num_threads_ variable breaks the ABI, and we don't want an ABI break between 1.14.0 and 1.14.1

mvieth avatar Jan 20 '24 14:01 mvieth