gtsfm Performance of different frond-ends

Performance of different frond-ends

Open sarlinpe opened this issue 1 year ago • 1 comments

Hi folks,

I've read your GTSfM paper - nice work, thanks for pushing this to arxiv. I enjoyed reading it and appreciate the huge effort that went into building it. I am very surprised by the conclusion that SuperPoint+Super/LightGlue is not as good as SIFT - in fact we've always observed the exact opposite with incremental SfM (COLMAP) on different easy and difficult datasets (ETH3D, IMC 2020/1/2/3). I went through the code but didn't find anything obvious.

The point clouds of SP+SG/LG look pretty sparse on several datasets, so do the matches in fig 3.

the shorter image side is resized to at most 760 pixels in length

So that'd give a 1351x760 px image for a 1920×1080 input - this seems fine.

A maximum of 5000 keypoints are used for each of the following front-ends

Do you know how many points are effectively extracted by SuperPoint per image? How often is the limit of 5k hit compared to SIFT?

https://github.com/borglab/gtsfm/blob/1b55b76a7421bea912eeb7d9ea05a686883c1373/gtsfm/frontend/detector_descriptor/superpoint.py#L45-L46

Do I understand correctly that you use the default settings? Did you try to tweak them? As is, it cannot return 5k keypoints on these kinds of images, unlike SIFT. I recommend the following:

decrease the detection threshold: keypoint_threshold=0.001
decrease the NMS radius: nms_radius=3
if images are smaller than the limit (760px), upsample them

This should make SuperPoint competitive with SIFT in terms of keypoint detection.

We do know that these deep matchers are more easily tricked by symmetries, as you point out in fig 3. This seems confirmed by table 3: compared to SIFT, the mean of the front-end errors is much higher than their median and they have many more VG outliers, especially on South Building and Crane.

Did you try tuning the filtering threshold (minimum number of inliers, cycle consistency) for each front-end? 15 and 7° seem pretty loose for front-ends that have a high recall.
Did you try running the averaging+BA on edges that are inliers according to the GT poses?
It seems that the motion averaging does not have any robustness built-in. Zhang et al. (ICCV 2023) show that using a robust cost function is critical (table 5) and that weighting by inlier count or two-view covariance can often help. Did you try this? This paper actually shows that SuperPoint+SuperGlue can work perfectly fine for global SfM.

Thanks! cc @Phil26AT @ducha-aiki

Dec 04 '23 09:12 sarlinpe

gtsfm gtsfm copied to clipboard

Performance of different frond-ends

gtsfm
gtsfm copied to clipboard