colmap icon indicating copy to clipboard operation
colmap copied to clipboard

Use minimal solvers from poselib

Open ahojnnes opened this issue 8 months ago • 26 comments

ahojnnes avatar Dec 03 '23 18:12 ahojnnes

On exact same two-view geometries, P3P from poselib vs. P3P from colmap doesn't seem to make any difference in practice. Runtime difference is not measureable.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib/
I1216 13:33:07.592862 48175 model.cc:429] Cameras: 1
I1216 13:33:07.592973 48175 model.cc:430] Images: 128
I1216 13:33:07.592978 48175 model.cc:431] Registered images: 128
I1216 13:33:07.592980 48175 model.cc:433] Points: 61160
I1216 13:33:07.592981 48175 model.cc:434] Observations: 326980
I1216 13:33:07.592985 48175 model.cc:436] Mean track length: 5.346305
I1216 13:33:07.592995 48175 model.cc:438] Mean observations per image: 2554.531250
I1216 13:33:07.592999 48175 model.cc:441] Mean reprojection error: 0.512072px

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main/
I1216 13:33:13.860183 48237 model.cc:429] Cameras: 1
I1216 13:33:13.860323 48237 model.cc:430] Images: 128
I1216 13:33:13.860327 48237 model.cc:431] Registered images: 128
I1216 13:33:13.860330 48237 model.cc:433] Points: 61160
I1216 13:33:13.860332 48237 model.cc:434] Observations: 326980
I1216 13:33:13.860337 48237 model.cc:436] Mean track length: 5.346305
I1216 13:33:13.860347 48237 model.cc:438] Mean observations per image: 2554.531250
I1216 13:33:13.860350 48237 model.cc:441] Mean reprojection error: 0.512072px

ahojnnes avatar Dec 16 '23 13:12 ahojnnes

Essentially identical results on E2E metrics when including poselib solvers in two-view geometry estimation. Poselib matching is ~10% faster due to faster minimal solver implementation.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib-with-matching/
I1216 18:21:10.982481 109487 model.cc:429] Cameras: 1
I1216 18:21:10.982582 109487 model.cc:430] Images: 128
I1216 18:21:10.982586 109487 model.cc:431] Registered images: 128
I1216 18:21:10.982589 109487 model.cc:433] Points: 82357
I1216 18:21:10.982590 109487 model.cc:434] Observations: 482856
I1216 18:21:10.982595 109487 model.cc:436] Mean track length: 5.862962
I1216 18:21:10.982606 109487 model.cc:438] Mean observations per image: 3772.312500
I1216 18:21:10.982610 109487 model.cc:441] Mean reprojection error: 0.615324px

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main-with-matching/
I1216 18:21:16.408385 109543 model.cc:429] Cameras: 1
I1216 18:21:16.408489 109543 model.cc:430] Images: 128
I1216 18:21:16.408493 109543 model.cc:431] Registered images: 128
I1216 18:21:16.408496 109543 model.cc:433] Points: 82330
I1216 18:21:16.408499 109543 model.cc:434] Observations: 482931
I1216 18:21:16.408504 109543 model.cc:436] Mean track length: 5.865796
I1216 18:21:16.408514 109543 model.cc:438] Mean observations per image: 3772.898438
I1216 18:21:16.408517 109543 model.cc:441] Mean reprojection error: 0.615088px

ahojnnes avatar Dec 16 '23 18:12 ahojnnes

Assuming unknown focal length, the results are also virtually identical. Overall reconstruction runtime is ~5% lower.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main-with-focal-estimation/
I1217 15:56:08.626852 171591 model.cc:429] Cameras: 1
I1217 15:56:08.626966 171591 model.cc:430] Images: 128
I1217 15:56:08.626971 171591 model.cc:431] Registered images: 128
I1217 15:56:08.626973 171591 model.cc:433] Points: 82334
I1217 15:56:08.626976 171591 model.cc:434] Observations: 482940
I1217 15:56:08.626981 171591 model.cc:436] Mean track length: 5.865621
I1217 15:56:08.626991 171591 model.cc:438] Mean observations per image: 3772.968750
I1217 15:56:08.626996 171591 model.cc:441] Mean reprojection error: 0.615133px
$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib-with-focal-estimation
I1217 15:56:00.155539 171526 model.cc:429] Cameras: 1
I1217 15:56:00.155656 171526 model.cc:430] Images: 128
I1217 15:56:00.155660 171526 model.cc:431] Registered images: 128
I1217 15:56:00.155663 171526 model.cc:433] Points: 82330
I1217 15:56:00.155665 171526 model.cc:434] Observations: 482936
I1217 15:56:00.155670 171526 model.cc:436] Mean track length: 5.865857
I1217 15:56:00.155681 171526 model.cc:438] Mean observations per image: 3772.937500
I1217 15:56:00.155685 171526 model.cc:441] Mean reprojection error: 0.615118px

ahojnnes avatar Dec 17 '23 15:12 ahojnnes

Assuming unknown focal length, the results are also virtually identical. Overall reconstruction runtime is ~5% lower.

$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/main-with-focal-estimation/
I1217 15:56:08.626852 171591 model.cc:429] Cameras: 1
I1217 15:56:08.626966 171591 model.cc:430] Images: 128
I1217 15:56:08.626971 171591 model.cc:431] Registered images: 128
I1217 15:56:08.626973 171591 model.cc:433] Points: 82334
I1217 15:56:08.626976 171591 model.cc:434] Observations: 482940
I1217 15:56:08.626981 171591 model.cc:436] Mean track length: 5.865621
I1217 15:56:08.626991 171591 model.cc:438] Mean observations per image: 3772.968750
I1217 15:56:08.626996 171591 model.cc:441] Mean reprojection error: 0.615133px
$ ./src/colmap/exe/colmap model_analyzer --path ~/data/south-building/poselib-with-focal-estimation
I1217 15:56:00.155539 171526 model.cc:429] Cameras: 1
I1217 15:56:00.155656 171526 model.cc:430] Images: 128
I1217 15:56:00.155660 171526 model.cc:431] Registered images: 128
I1217 15:56:00.155663 171526 model.cc:433] Points: 82330
I1217 15:56:00.155665 171526 model.cc:434] Observations: 482936
I1217 15:56:00.155670 171526 model.cc:436] Mean track length: 5.865857
I1217 15:56:00.155681 171526 model.cc:438] Mean observations per image: 3772.937500
I1217 15:56:00.155685 171526 model.cc:441] Mean reprojection error: 0.615118px

If you have a single camera, is the unknown focal length solver used? I assumed it would be running P3P once the camera is in the reconstruction? Also comparing to pycolmap at least, H matrix estimation was the problem where I saw the biggest improvement in terms of runtime. (iirc this is run for the geometric verification as well?) Might be worthwhile to replace this solver too.

vlarsson avatar Dec 18 '23 09:12 vlarsson

@vlarsson I hardcoded to always use p4pf in the experiment above, but you are right that I probably did not do the same for the existing focal length estimation experiment. I will also take a look at replacing the homography solver. Thanks for the hint.

ahojnnes avatar Dec 18 '23 18:12 ahojnnes

@vlarsson I double confirmed and my experimental results above are correct. In both cases, I forced focal length estimation for each camera from scratch. The runtime of p4pf is still faster, because we don't have to run 20 P3Ps in parallel but only a single RANSAC. p4pf likely also works better than the sampling based approach when the initial guess of the focal length is very far off from the true focal length.

ahojnnes avatar Dec 18 '23 20:12 ahojnnes

@vlarsson I noticed that your homography solver doesn't normalize the image coordinates, which may not behave well for images with large image resolution due to squaring of pixel coordinates in the design matrix (e.g., for images with size of ~10'000 pixels, the matrix will have values of ~100M). Did you not observe any issues in this case?

EDIT: I did some quick and dirty experiments and it seems like the normalization of points is not necessary. I assume it accounts for a significant fraction of the overhead of the colmap homography solver...

ahojnnes avatar Dec 18 '23 20:12 ahojnnes

@vlarsson I noticed that your homography solver doesn't normalize the image coordinates, which may not behave well for images with large image resolution due to squaring of pixel coordinates in the design matrix (e.g., for images with size of ~10'000 pixels, the matrix will have values of ~100M). Did you not observe any issues in this case?

EDIT: I did some quick and dirty experiments and it seems like the normalization of points is not necessary. I assume it accounts for a significant fraction of the overhead of the colmap homography solver...

I have not seen any issues with this either, but I have also not done extensive experiments on it :) My gut feeling is that normalisation does not matter so much for the minimal 4p problem, but is mostly important if you want to run DLT with more points.

vlarsson avatar Dec 19 '23 08:12 vlarsson

@tsattler Do you have any datasets that could be suitable to evaluate the changes in this branch? I ran the changes against some of the standard datasets from UNC and it all looks good. Do you have anything more suitable like your Aachen dataset to check in particular the new p4pf solver against? Cheers.

ahojnnes avatar Jan 07 '24 15:01 ahojnnes

@ahojnnes Sure, I can do this. Can look into this in a week or so once I finished CVPR reviews and other things. If I don't reply in some time, can you please ping me?

tsattler avatar Jan 07 '24 15:01 tsattler

Great, thank you. No urgency. I will try my best to remember :-)

ahojnnes avatar Jan 07 '24 15:01 ahojnnes

Actually, playing with focal length solvers has been on my TODO list for a while. Is it only the P4Pf solver or also something that estimates the radial distortion?

tsattler avatar Jan 07 '24 16:01 tsattler

Only focal length. I just swapped out the existing sampling+p3p with poselib's p4pf solver. As a next step, we could also try to estimate radial distortion but I wanted to keep this for a future pull request.

ahojnnes avatar Jan 07 '24 16:01 ahojnnes

@tsattler I am also interested in this. Also for the two-view case. Let's talk more :)

vlarsson avatar Jan 08 '24 08:01 vlarsson

@vlarsson Sounds good. We can try to setup a benchmark for these methods together.

tsattler avatar Jan 08 '24 10:01 tsattler

@tsattler Gentle reminder for the above 🙂😉 T - 24 hours.

ahojnnes avatar Jan 23 '24 07:01 ahojnnes

@ahojnnes Thanks for the reminder. It is on my TODO list and climbing higher in terms of priorities :) What are the most important parts to cover in terms of experiments? Which solvers? Which types of experiments?

tsattler avatar Jan 23 '24 13:01 tsattler

I am quite confident about all solvers except for the new p4pf estimator. The Aachen dataset could be useful?

ahojnnes avatar Jan 23 '24 21:01 ahojnnes

@vlarsson @tsattler I now tested the performance of this branch, in particular the p4pf solver in replacement of the sampling based approach, and it appears that p4pf performs worse than sampling on a number of datasets with unknown focal length from the internet. Given my current doubts about the quality of p4pf, I am probably going to go ahead with this PR but undo the p4pf changes.

ahojnnes avatar Mar 01 '24 16:03 ahojnnes

For example, on the cornell arts quad dataset, I get the following with p4pf: image versus the following for colmap's sampling based approach: image

It's not that either of them are perfect but the p4pf one is visibly worse.

ahojnnes avatar Mar 01 '24 17:03 ahojnnes

@ahojnnes i would like to one more time kindly ask to consider not using cmake's FetchContent as a mean to satisfy PoseLib dependency and rather integrate PoseLib either as a git submodule or as an externally provided library to be discovered via find_package.

S-o-T avatar Mar 28 '24 23:03 S-o-T