colmap Use minimal solvers from poselib

Dec 03 '23 18:12 ahojnnes

On exact same two-view geometries, P3P from poselib vs. P3P from colmap doesn't seem to make any difference in practice. Runtime difference is not measureable.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib/
I1216 13:33:07.592862 48175 model.cc:429] Cameras: 1
I1216 13:33:07.592973 48175 model.cc:430] Images: 128
I1216 13:33:07.592978 48175 model.cc:431] Registered images: 128
I1216 13:33:07.592980 48175 model.cc:433] Points: 61160
I1216 13:33:07.592981 48175 model.cc:434] Observations: 326980
I1216 13:33:07.592985 48175 model.cc:436] Mean track length: 5.346305
I1216 13:33:07.592995 48175 model.cc:438] Mean observations per image: 2554.531250
I1216 13:33:07.592999 48175 model.cc:441] Mean reprojection error: 0.512072px

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main/
I1216 13:33:13.860183 48237 model.cc:429] Cameras: 1
I1216 13:33:13.860323 48237 model.cc:430] Images: 128
I1216 13:33:13.860327 48237 model.cc:431] Registered images: 128
I1216 13:33:13.860330 48237 model.cc:433] Points: 61160
I1216 13:33:13.860332 48237 model.cc:434] Observations: 326980
I1216 13:33:13.860337 48237 model.cc:436] Mean track length: 5.346305
I1216 13:33:13.860347 48237 model.cc:438] Mean observations per image: 2554.531250
I1216 13:33:13.860350 48237 model.cc:441] Mean reprojection error: 0.512072px

Dec 16 '23 13:12 ahojnnes

Essentially identical results on E2E metrics when including poselib solvers in two-view geometry estimation. Poselib matching is ~10% faster due to faster minimal solver implementation.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib-with-matching/
I1216 18:21:10.982481 109487 model.cc:429] Cameras: 1
I1216 18:21:10.982582 109487 model.cc:430] Images: 128
I1216 18:21:10.982586 109487 model.cc:431] Registered images: 128
I1216 18:21:10.982589 109487 model.cc:433] Points: 82357
I1216 18:21:10.982590 109487 model.cc:434] Observations: 482856
I1216 18:21:10.982595 109487 model.cc:436] Mean track length: 5.862962
I1216 18:21:10.982606 109487 model.cc:438] Mean observations per image: 3772.312500
I1216 18:21:10.982610 109487 model.cc:441] Mean reprojection error: 0.615324px

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main-with-matching/
I1216 18:21:16.408385 109543 model.cc:429] Cameras: 1
I1216 18:21:16.408489 109543 model.cc:430] Images: 128
I1216 18:21:16.408493 109543 model.cc:431] Registered images: 128
I1216 18:21:16.408496 109543 model.cc:433] Points: 82330
I1216 18:21:16.408499 109543 model.cc:434] Observations: 482931
I1216 18:21:16.408504 109543 model.cc:436] Mean track length: 5.865796
I1216 18:21:16.408514 109543 model.cc:438] Mean observations per image: 3772.898438
I1216 18:21:16.408517 109543 model.cc:441] Mean reprojection error: 0.615088px

Dec 16 '23 18:12 ahojnnes

Assuming unknown focal length, the results are also virtually identical. Overall reconstruction runtime is ~5% lower.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main-with-focal-estimation/
I1217 15:56:08.626852 171591 model.cc:429] Cameras: 1
I1217 15:56:08.626966 171591 model.cc:430] Images: 128
I1217 15:56:08.626971 171591 model.cc:431] Registered images: 128
I1217 15:56:08.626973 171591 model.cc:433] Points: 82334
I1217 15:56:08.626976 171591 model.cc:434] Observations: 482940
I1217 15:56:08.626981 171591 model.cc:436] Mean track length: 5.865621
I1217 15:56:08.626991 171591 model.cc:438] Mean observations per image: 3772.968750
I1217 15:56:08.626996 171591 model.cc:441] Mean reprojection error: 0.615133px
$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib-with-focal-estimation
I1217 15:56:00.155539 171526 model.cc:429] Cameras: 1
I1217 15:56:00.155656 171526 model.cc:430] Images: 128
I1217 15:56:00.155660 171526 model.cc:431] Registered images: 128
I1217 15:56:00.155663 171526 model.cc:433] Points: 82330
I1217 15:56:00.155665 171526 model.cc:434] Observations: 482936
I1217 15:56:00.155670 171526 model.cc:436] Mean track length: 5.865857
I1217 15:56:00.155681 171526 model.cc:438] Mean observations per image: 3772.937500
I1217 15:56:00.155685 171526 model.cc:441] Mean reprojection error: 0.615118px

Dec 17 '23 15:12 ahojnnes

Assuming unknown focal length, the results are also virtually identical. Overall reconstruction runtime is ~5% lower.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main-with-focal-estimation/
I1217 15:56:08.626852 171591 model.cc:429] Cameras: 1
I1217 15:56:08.626966 171591 model.cc:430] Images: 128
I1217 15:56:08.626971 171591 model.cc:431] Registered images: 128
I1217 15:56:08.626973 171591 model.cc:433] Points: 82334
I1217 15:56:08.626976 171591 model.cc:434] Observations: 482940
I1217 15:56:08.626981 171591 model.cc:436] Mean track length: 5.865621
I1217 15:56:08.626991 171591 model.cc:438] Mean observations per image: 3772.968750
I1217 15:56:08.626996 171591 model.cc:441] Mean reprojection error: 0.615133px
$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib-with-focal-estimation
I1217 15:56:00.155539 171526 model.cc:429] Cameras: 1
I1217 15:56:00.155656 171526 model.cc:430] Images: 128
I1217 15:56:00.155660 171526 model.cc:431] Registered images: 128
I1217 15:56:00.155663 171526 model.cc:433] Points: 82330
I1217 15:56:00.155665 171526 model.cc:434] Observations: 482936
I1217 15:56:00.155670 171526 model.cc:436] Mean track length: 5.865857
I1217 15:56:00.155681 171526 model.cc:438] Mean observations per image: 3772.937500
I1217 15:56:00.155685 171526 model.cc:441] Mean reprojection error: 0.615118px

If you have a single camera, is the unknown focal length solver used? I assumed it would be running P3P once the camera is in the reconstruction? Also comparing to pycolmap at least, H matrix estimation was the problem where I saw the biggest improvement in terms of runtime. (iirc this is run for the geometric verification as well?) Might be worthwhile to replace this solver too.

Dec 18 '23 09:12 vlarsson

@vlarsson I hardcoded to always use p4pf in the experiment above, but you are right that I probably did not do the same for the existing focal length estimation experiment. I will also take a look at replacing the homography solver. Thanks for the hint.

Dec 18 '23 18:12 ahojnnes

@vlarsson I double confirmed and my experimental results above are correct. In both cases, I forced focal length estimation for each camera from scratch. The runtime of p4pf is still faster, because we don't have to run 20 P3Ps in parallel but only a single RANSAC. p4pf likely also works better than the sampling based approach when the initial guess of the focal length is very far off from the true focal length.

Dec 18 '23 20:12 ahojnnes

@vlarsson I noticed that your homography solver doesn't normalize the image coordinates, which may not behave well for images with large image resolution due to squaring of pixel coordinates in the design matrix (e.g., for images with size of ~10'000 pixels, the matrix will have values of ~100M). Did you not observe any issues in this case?

EDIT: I did some quick and dirty experiments and it seems like the normalization of points is not necessary. I assume it accounts for a significant fraction of the overhead of the colmap homography solver...

Dec 18 '23 20:12 ahojnnes

@vlarsson I noticed that your homography solver doesn't normalize the image coordinates, which may not behave well for images with large image resolution due to squaring of pixel coordinates in the design matrix (e.g., for images with size of ~10'000 pixels, the matrix will have values of ~100M). Did you not observe any issues in this case?

EDIT: I did some quick and dirty experiments and it seems like the normalization of points is not necessary. I assume it accounts for a significant fraction of the overhead of the colmap homography solver...

I have not seen any issues with this either, but I have also not done extensive experiments on it :) My gut feeling is that normalisation does not matter so much for the minimal 4p problem, but is mostly important if you want to run DLT with more points.

Dec 19 '23 08:12 vlarsson

@tsattler Do you have any datasets that could be suitable to evaluate the changes in this branch? I ran the changes against some of the standard datasets from UNC and it all looks good. Do you have anything more suitable like your Aachen dataset to check in particular the new p4pf solver against? Cheers.

Jan 07 '24 15:01 ahojnnes

@ahojnnes Sure, I can do this. Can look into this in a week or so once I finished CVPR reviews and other things. If I don't reply in some time, can you please ping me?

Jan 07 '24 15:01 tsattler

Great, thank you. No urgency. I will try my best to remember :-)

Jan 07 '24 15:01 ahojnnes

Actually, playing with focal length solvers has been on my TODO list for a while. Is it only the P4Pf solver or also something that estimates the radial distortion?

Jan 07 '24 16:01 tsattler

Only focal length. I just swapped out the existing sampling+p3p with poselib's p4pf solver. As a next step, we could also try to estimate radial distortion but I wanted to keep this for a future pull request.

Jan 07 '24 16:01 ahojnnes

@tsattler I am also interested in this. Also for the two-view case. Let's talk more :)

Jan 08 '24 08:01 vlarsson

@vlarsson Sounds good. We can try to setup a benchmark for these methods together.

Jan 08 '24 10:01 tsattler

@tsattler Gentle reminder for the above 🙂😉 T - 24 hours.

Jan 23 '24 07:01 ahojnnes

@ahojnnes Thanks for the reminder. It is on my TODO list and climbing higher in terms of priorities :) What are the most important parts to cover in terms of experiments? Which solvers? Which types of experiments?

Jan 23 '24 13:01 tsattler

I am quite confident about all solvers except for the new p4pf estimator. The Aachen dataset could be useful?

Jan 23 '24 21:01 ahojnnes

@vlarsson @tsattler I now tested the performance of this branch, in particular the p4pf solver in replacement of the sampling based approach, and it appears that p4pf performs worse than sampling on a number of datasets with unknown focal length from the internet. Given my current doubts about the quality of p4pf, I am probably going to go ahead with this PR but undo the p4pf changes.

Mar 01 '24 16:03 ahojnnes

For example, on the cornell arts quad dataset, I get the following with p4pf: versus the following for colmap's sampling based approach:

It's not that either of them are perfect but the p4pf one is visibly worse.

Mar 01 '24 17:03 ahojnnes

@ahojnnes i would like to one more time kindly ask to consider not using cmake's FetchContent as a mean to satisfy PoseLib dependency and rather integrate PoseLib either as a git submodule or as an externally provided library to be discovered via find_package.

Mar 28 '24 23:03 S-o-T

@vlarsson I recall that you were interested in investigating the issue about p4pf. Did you have time to do that?

May 28 '24 18:05 ahojnnes

@S-o-T Consuming poselib as a submodule is not easily possible without a significant refactor of poselib itself due to the build rule and include folder structure. Consuming poselib as a package through find_package appears to have no benefit over the Fetch_Content approach in terms of being able to easily modify poselib sources while compiling colmap? Could you please clarify your proposal?

May 28 '24 18:05 ahojnnes

Consuming poselib as a submodule is not easily possible without a significant refactor of poselib itself due to the build rule and include folder structure

I am wondering what exactly it is not working with poselib as submodule, I am interested to fix it. I remember making sure it was working some time ago: https://github.com/PoseLib/PoseLib/pull/17

Is this perhaps because the pybind is now a submodule of poselib?

May 29 '24 19:05 pablospe

@ahojnnes thanks for considering this issue.

Consuming poselib as a package through find_package appears to have no benefit over the Fetch_Content approach in terms of being able to easily modify poselib sources while compiling colmap?

My approach to work with colmap is to use IDE that for each set of cmake options creates/uses individual build directories. Given that there are usually more then one set of cmake options that are actively worked with (e.g. Release\Debug, options controlling whether to enable/compile specific parts of colmap etc), each of this directories would contain its private copy of poselib downloaded by cmake via Fetch_Content during configure phase. Thus, actively switching between such configurations of colmap build, while working on modifications to poselib sources, becomes somewhat impossible (assuming no mechanism to automagically sync such changes). But if poselib would've been used as an internal third-party (via submodule)/external dependency, then i could work on changes into poselib independently of particular colmap cmake options set.

Anyway, i believe that this could be solved by a relatively small patch that i am capable to maintain locally, so i think that this should be addressed only if there are other users of colmap that might be facing the same issue.

Jun 04 '24 17:06 S-o-T

@vlarsson I recall that you were interested in investigating the issue about p4pf. Did you have time to do that?

I should finally have some time to look into this second week of July. I also have a backlog of todos in PoseLib that I want to get into during the summer :)

Jun 23 '24 08:06 vlarsson

For example, on the cornell arts quad dataset, I get the following with p4pf: versus the following for colmap's sampling based approach:

It's not that either of them are perfect but the p4pf one is visibly worse.

@ahojnnes Do still have the Arts Quad database? If so could you share it. It would save me some compute.

Jul 15 '24 13:07 vlarsson

I've done some initial testing. I ran the reconstruction of ArtsQuad (with the main branch version) and dumped all of the correspondences from the registration where estimate_focal_length was set. I put together a quick (LO-)RANSAC in poselib for unknown focal and it seems to work okay (at least metrics wise).

So first two rows are standard sampling based estimator, before and after the RefineAbsolutePose call. Focal length errors are relative (%) to the "ground truth" focal length provided in the dataset (maybe they are from bundler?).

I've tried some different things.

Comparing different solvers
Vanilla RANSAC vs LO RANSAC
Minimize reproj error on the minimal sample (MinRefine). Since it's over determined it will not satisfy the 4 points exactly.
Filter minimal samples based on reprojection error on the 4 points (MinFilter).

So far the impact has been quite minor. I am not sure yet why your reconstruction was failing. The P4P_RSC above should be fairly close to what you had. I will try to integrate this and re-run the reconstruction with this.

Jul 19 '24 08:07 vlarsson

Thanks very much @vlarsson for the experiments. This suggests your solvers should be superior to the sampling based approach. I am not sure where the input focal lengths are coming from. These might be extracted from EXIF information as well. I am looking forward to see how this works on the reconstruction. I am wondering whether the P4Pf/P5Pf approach sometimes fails catastrophically and thus leads to a few bad cameras that lead to the bad overall results. Such a behavior would not necessarily be captured by the stats you shared above? The sampling based approach is fundamentally limited to a fixed range of focal lengths and cannot blow up completely?

Jul 21 '24 16:07 ahojnnes

The reconstruction with the poselib estimator looked okay, however the vanilla colmap reconstruction I got had larger artifacts compared to the result you posted above. I wonder if this scene is a bit unstable and maybe we should not draw too much conclusions from it. I will try to setup more controlled experiments to figure out what is actually working well.

Another thing I noted was that running the vanilla colmap reconstruction with SIMPLE_PINHOLE instead of SIMPLE_RADIAL completely broke the reconstruction. Once I get a more proper benchmark up and running I will look into adding distortion estimation as well.

Thanks very much @vlarsson for the experiments. This suggests your solvers should be superior to the sampling based approach. I am not sure where the input focal lengths are coming from. These might be extracted from EXIF information as well. I am looking forward to see how this works on the reconstruction. I am wondering whether the P4Pf/P5Pf approach sometimes fails catastrophically and thus leads to a few bad cameras that lead to the bad overall results. Such a behavior would not necessarily be captured by the stats you shared above? The sampling based approach is fundamentally limited to a fixed range of focal lengths and cannot blow up completely?

This seems reasonable to me. The sampling is essentially a form of regularization on the focal lengths. I think doing some basic filtering on focal lengths in the minimal solver could have a similar effect. I will look into this as well.

Jul 22 '24 08:07 vlarsson