Issues with COLMAP conversion + Splatfacto downstream on large scenes

Open W-OK-E opened this issue 6 months ago • 1 comments

Hi! First of all, amazing work on VGGT — really impressive results.

I had a few questions regarding downstream reconstruction tasks once the poses/3D points are obtained from VGGT:

I tried using demo_colmap.py (both with and without bundle adjustment) to generate COLMAP outputs for training Splatfacto via Nerfstudio. While small scenes work reasonably well, the results for larger scenes are quite poor — lots of noisy poses and inaccurate reconstructions.

Interestingly, enabling bundle adjustment often leads to even worse results (or sometimes fails entirely, depending on max_reproj_error), similar to [linked issue]. The vggt_to_colmap.py script by [user] seems to give better pose stability.

Is this a known issue with BA? Are there parameter configs you would recommend for large-scale scenes when converting VGGT outputs to COLMAP?

I’ve attached visual comparisons with and without BA for reference. Any insights would be greatly appreciated — thanks again for releasing VGGT!

The above image visualises camera poses generated and saved to .bin files by demo_colmap.py, the sparse pointcloud isn't displayed for some reason, and also when visualized separately, doesn't make much sense which is very weird given the fact that it's the exact same pipeline.

This one is using the vggt_to_colmap.py script and seems to be much reflective and accurate. Any advice/insights would be wonderful !!

Jun 18 '25 18:06 W-OK-E

Hi thanks!

Increasing query_frame_num (e.g., 10, 20, or higher) and max_query_pts (e.g., 4096) should help improve robustness, though with a slight trade-off in speed.

Regarding the comparison to vggt_to_colmap.py, I’m a bit confused. That script should only perform a format conversion of the output and doesn’t include anything functionally, so it should behave identically to demo_colmap.py without --use_ba. Could you verify if both produce equivalent results? If not enough points are retained, try lowering conf_thres_value.

Overall it would be something like:

% Without BA; lower confidence filtering (should match vggt_to_colmap.py)
python demo_colmap.py --scene_dir=/YOUR/SCENE_DIR/ --conf_thres_value=1.1 

# With BA; more keyframes and feature tracks for improved results
# Adjust max_reproj_error as needed
python demo_colmap.py --scene_dir=/YOUR/SCENE_DIR/ --use_ba --query_frame_num=24 --max_query_pts=4096

Besides, iterative BA should help a lot, which is still on the TODO list. I had planned to implement it but haven’t found the time due to bandwidth constraints — sorry about that.

https://github.com/facebookresearch/vggt/blob/cd02675241e8b6c8c7f09f211fbc33c88580da7a/demo_colmap.py#L36

Jun 18 '25 23:06 jytime