reference and parallelization
Hi gabyx. Thanks for offering your work. It is valuable for me. Can you offer the reference paper of your implementation? I think it would be helpful to understand your code by my own. Question 2 is - do you think ApproxMVBB is easy to parallelization by means of OpenMP/MPI etc. ? In my experiment, the speed using ApproxMVBB almostly equals to the speed of rotating calipers method with parallelization.
Sure :-):
Here are the Papers used for the implementation: They are quite mathematical, especially the second one. So you 're probably better off by reading the paper a bit while looking at the source code.
@article{malandain2002, Author = {Gr'egoire Malandain and Jean-Daniel Boissonnat}, Journal = {International Journal of Computational Geometry & Applications}, Month = {December}, Number = {6}, Pages = {489 - 510}, Timestamp = {2015.09.02}, Title = {Computing the Diameter of a Point Set}, Volume = {12}, Year = {2002}}
and
@inproceedings{barequet2001, Author = {Gill Barequet and Sariel Har-peled}, Booktitle = {In Proc. 10th ACM-SIAM Sympos. Discrete Algorithms}, Pages = {38--91}, Timestamp = {2015.09.02}, Title = {Efficiently Approximating the Minimum-Volume Bounding Box of a Point Set in Three Dimensions}, Year = {2001}}
What parallelization concerns: I was not yet thinking about that, but the thing is there are several parts which could benefit from that. I dont know how to parallelize the graham scan or the rotating calipers method, but you already have one as you mentioned?
But I think one straight forward way of some parallelization would be to run the exhaustive grid search (7) in parallel, that would be easy so each thread (OpenMP) could do: one projection into a direction, 2d convex hull, minimal area rectangle and construct the bounding box. and then after each iteration, synchronize (mutex on the volume value and oobb storage) to store the best result. and continuing till max iteration is reached.
with MPI: hmm, hm... probably in the same fashion (for a start), there is some communication involved, either communicating point indices (if every process has the point cloud) or communication whole point data.
Another thing there are lots of 2d projections of the 3d points and -> that means if we can reduce run time of https://github.com/gabyx/ApproxMVBB/blob/master/include/ApproxMVBB/ProjectedPointSet.hpp that would certainly be good :-). Eigen3 already uses SSE instructions, so thats good. And the projections for the Exhaustive Grid Search in step 7 are only done on fraction of the initial point cloud : min/max points (approximating the convex hull) obtained from the gridded point cloud as described in step 6. Roughly 400-1000 points or even less...
There is another interesting paper which is claimed to be even faster and would be cool if that could be implemented in this library as well:
@article{chang2011, Acmid = {2019641}, Address = {New York, NY, USA}, Articleno = {122}, Author = {Chang, Chia-Tche and Gorissen, Bastien and Melchior, Samuel}, Doi = {10.1145/2019627.2019641}, Issn = {0730-0301}, Issue_Date = {October 2011}, Journal = {ACM Trans. Graph.}, Keywords = {Computational geometry, bounding box, manifolds, optimization}, Month = oct, Number = {5}, Numpages = {16}, Pages = {122:1--122:16}, Publisher = {ACM}, Timestamp = {2015.09.03}, Title = {Fast Oriented Bounding Box Optimization on the Rotation Group {$SO(3,\mathbb{R})$}}, Url = {http://doi.acm.org/10.1145/2019627.2019641}, Volume = {30}, Year = {2011}, Bdsk-Url-1 = {http://doi.acm.org/10.1145/2019627.2019641}, Bdsk-Url-2 = {http://dx.doi.org/10.1145/2019627.2019641}}
It is based on optimization with the Nelder-Mead simplex (dont know if that is straight forward to parallelize, its a genetic opt. aglorithm.
Rotating Calipers in 3d? or in 2d?
Hi gabyx! Thanks for your detailed reply :)
I think the algorithm I parallelized is exhaustive grid search you mentioned. Rotating Calipers in 2D is only used at computing 2D minimal area rectangle.
All of the three papers above is interesting. I will devour them and then come back to you. Thank you.
Hi, I made considerable improvements to the library, fixed some small bugs, conceptually not alot has changed, functions are still the same.
also I added unit tests. They should hopefully turn out to be correct. if they dont, then its numerically inaccuracy: visualizing the tests if the fail is always the first thing to do =)
regards
The library has now some openmp support, one place where openmp is commented out yet still makes some problems (more investigations needed), but so far its a huge improvement for speed.
Curious under what circumstances you got the huge speed improvement in the openmp parallelization (around the grid search). Unfortunately my testing is done in MSVC and WSL (not native linux), so I may be missing some of the benefits (side note: I had to re-write the parallelization for MSVC because it is stuck on openmp 2.0 and doesn't support user-defined reductions... I can make a PR for it if you want us poor souls that have to use MSVC)
Grid search: 5 Point Samples: 10000
MSVC Single threaded: ~20000 msec MSVC Openmp (critical section version): ~2000 msec
WSL Single threaded: ~9000 msec WSL Openmp (reduction version): ~19000 msec WSL Openmp (critical section version that I used on MSVC): ~21000 msec
Strange Benchmark: Did you use full optimization and release? A PR is welcome :-) I prefere to have some sort of switch depending on the OpenMP version (is there a macro?)
For WSL it was -03 -DNDEBUG (for MSVC it is /O2 /Ob2 /D "NDEBUG").
Code being timed is below where the input cloud has 100,000 points (generated randomly from a uniform distribution in the range of -100.0 to 100.0).
ApproxMVBB::OOBB approx_obb = ApproxMVBB::approximateMVBB(cloud.pts(),
0.001, /*epsilon*/
10000, /*point samples*/
5, /*grid size*/
0, /*diameter optizmiation loops*/
5 /*grid search optimization loops*/
);
approx_obb.expandToMinExtentRelative(0.1);
... = approx_obb.m_q_KI.matrix();
... = approx_obb.m_minPoint;
... = approx_obb.m_maxPoint;
I'm not aware of any OpenMP MACRO that can be used... I just check if WIN32 and use the OpenMP 2.0 style critical section.
I will push the PR... but by supporting OpenMP 2.0 it makes the code much uglier compared to user-defined reductions :)
see #38
Could you test this?
Strange is, that its so slow with OpenMp 4 in WSL? its twice as slow lol
Honestly I never did a relevant profiling... Needs further investigation. Maybe profile the code ?