gtsfm
gtsfm copied to clipboard
Performance of different frond-ends
Hi folks,
I've read your GTSfM paper - nice work, thanks for pushing this to arxiv. I enjoyed reading it and appreciate the huge effort that went into building it. I am very surprised by the conclusion that SuperPoint+Super/LightGlue is not as good as SIFT - in fact we've always observed the exact opposite with incremental SfM (COLMAP) on different easy and difficult datasets (ETH3D, IMC 2020/1/2/3). I went through the code but didn't find anything obvious.
- The point clouds of SP+SG/LG look pretty sparse on several datasets, so do the matches in fig 3.
the shorter image side is resized to at most 760 pixels in length
So that'd give a 1351x760 px image for a 1920×1080 input - this seems fine.
A maximum of 5000 keypoints are used for each of the following front-ends
Do you know how many points are effectively extracted by SuperPoint per image? How often is the limit of 5k hit compared to SIFT?
https://github.com/borglab/gtsfm/blob/1b55b76a7421bea912eeb7d9ea05a686883c1373/gtsfm/frontend/detector_descriptor/superpoint.py#L45-L46
Do I understand correctly that you use the default settings? Did you try to tweak them? As is, it cannot return 5k keypoints on these kinds of images, unlike SIFT. I recommend the following:
- decrease the detection threshold:
keypoint_threshold=0.001
- decrease the NMS radius:
nms_radius=3
- if images are smaller than the limit (760px), upsample them
This should make SuperPoint competitive with SIFT in terms of keypoint detection.
- We do know that these deep matchers are more easily tricked by symmetries, as you point out in fig 3. This seems confirmed by table 3: compared to SIFT, the mean of the front-end errors is much higher than their median and they have many more VG outliers, especially on South Building and Crane.
- Did you try tuning the filtering threshold (minimum number of inliers, cycle consistency) for each front-end? 15 and 7° seem pretty loose for front-ends that have a high recall.
- Did you try running the averaging+BA on edges that are inliers according to the GT poses?
- It seems that the motion averaging does not have any robustness built-in. Zhang et al. (ICCV 2023) show that using a robust cost function is critical (table 5) and that weighting by inlier count or two-view covariance can often help. Did you try this? This paper actually shows that SuperPoint+SuperGlue can work perfectly fine for global SfM.
Thanks! cc @Phil26AT @ducha-aiki