py4dgeo
py4dgeo copied to clipboard
Long running time of M3C2 on entire point cloud compared to CloudCompare
With the help of this git issue I'm able to run M3C2 algorithm with (I think) the same params used in CloudCompare. However, the M3C2 algorithm takes roughly 100x times longer than the implementation of CloudCompare.
Cloudcompare M3C2: ~2 seconds py4dgeo M3C2: 271 seconds
I put all the files to recreate the experiment here.
Is the time difference caused by a param that I forgot to configure in the py4dgeo implementation? Here is the config file (default settings exported from CloudCompare): m3c2_params.txt
Thanks for providing all the configuration and data. I will reproduce and investigate this next week.
Hey @chrise96 Thanks again for providing test data and configuration, this has been really helpful. I found a few things that went in favor of CloudCompare in your comparison - some can be fixed, some can be documented and some will need future work in py4dgeo
(remember we are in early dev):
- CloudCompare's scale parameters are diameters, while py4dgeo uses radii. This lead to py4dgeo operating with a 4x larger search cylinder. I am currently fixing this for the
CloudCompareM3C2
class in #129, but you can also just divide your radii by 2. - Your M3C2 configuration contains
SubsampleEnabled=true
which means that you are not using the input cloud as the set of core points, but a downsampled version of it that contains only a fraction of points (in my testing with CC and your data, only 1%). You are explicitly tellingpy4dgeo
to use the entire point cloud though withcorepoints = epoch1.cloud
. The M3C2 algorithm is linear in the number of corepoints which makes this one particularly important. Can you doublecheck the number of corepoints from the CC logs? - CloudCompare has a (quite undocumented) option called
UseSinglePass4Depth
that if set tofalse
enables a performance optimization thatpy4dgeo
has not (yet) implemented (see #88 ). You might want to set that totrue
to better compare againstpy4dgeo
s current state.
Here is a modified version of your notebook. It does the same thing, only that it splits py4dgeo
s application of M3C2 into a few substeps: Searchtree construction, Normal Calculation, Distance Calculation. All of these have always been performed, but lazily evaluated during run()
.
Can you run again on your end and see how performance compares?
Thank you for the very detailed update!
I see now indeed that the config .txt file I provided contains SubsampleEnabled=true
, this must be false... I updated the branch with this change. It takes now 19.5 seconds in CloudCompare.
I didn't know about this UseSinglePass4Depth
option (In the advanced tab in M3C2 CloudCompare "Do not use multiple pass for depth").
Here a complete screenshot of the modified notebook run (dividing radii by 2 really speeds it up):
I already feared it would not be as easy as the downsampling setting :disappointed:.
I am assuming you run this on Windows - correct? I made some tests between Linux and Windows on the same machine (dual boot, no virtualization) and found the results to be quite surprising:
Setup | py4dgeo Normals | CC Normals | py4dgeo Distances | CC Distances |
---|---|---|---|---|
Windows 6 Threads | 13s | 6s | 127s | 30s |
Windows 1 Thread | 28.4s | -- | 275s | -- |
Windows 6 Threads (Blocking) | 10s | -- | 97s | -- |
Linux 6 Threads | 2.7s | -- | 34s | -- |
Linux 1 Thread | 14s | -- | 201s | -- |
I conclude that we have a toolchain issue on Windows that introduces a significant performance penalty. There is a multithreading related aspect to it (Linux scales roughly optimal, Windows not at all), but sequential performance is also clearly affected. The Blocking
variant in above table lets OpenMPs dynamic scheduler work on chunks of 128 corepoints. My next experiments will be to vary the Windows toolchain to get a better understanding of where the problem might be.
Oke, I run on macOS.
The M3C2 distance results in py4dgeo are very different compared to the CloudCompare results. Points in some static objects, for example a street sign in the provided point clouds, do not come close to the 0 value for the M3C2 distance. How do you choose the best configuration params for py4dgeo?