vggt icon indicating copy to clipboard operation
vggt copied to clipboard

The demo_colmap output to gsplat question.

Open kk6398 opened this issue 5 months ago • 5 comments

Hi, I tried to output demo_colmap.py and use it directly for 3DGS, it worked. But I am curious that:

  1. The output "images.bin" of demo_colmap.py still keep the alignment of the first frame coordinate(It isn't?). However, the coordinate of colmap is the centers the scene. Can we successfully implement gsplat without transforming coordinate? And 3DGS seems that there is no conversion from OpenCV to OpenGL, right?

  2. On the other hand, I tried to render it in 4DGS, but it failed.
    Before training the 4DGS, the pose has been processed as follows: ① w2c-->c2w; ② ​​OpenCV Coordinate --> ​​OpenGL Coordinate poses = np.concatenate([poses[:, 1:2, :], poses[:, 0:1, :], -poses[:, 2:3, :], poses[:, 3:4, :], poses[:, 4:5, :]], 1) ; ③ Merge image height weight focal and depth near, depth far.

Can you provide some advices? Thank you!!!

kk6398 avatar Jun 13 '25 09:06 kk6398

Hey @kk6398 can you share your outputs over here? You used the demo_colmap.py script without any major changes right? Did you use bundle adjustment as well? Thing is I tried the demo_colmap.py and the older vggt_to_colmap.py script and the latter seems to do better for some reason.

Would be great if you could provide the above details!

W-OK-E avatar Jun 17 '25 16:06 W-OK-E

Thank for your reply, and congratulations on winning the Best Paper Award!

  1. I didn't change demo_colmap.py script anywhere.

  2. I run the code of "https://github.com/hustvl/4DGaussians?tab=readme-ov-file", ① demo_colmap.py outputs the colmap file; ② “python LLFF/imgs2poses.py ./colmap_tmp/” converts the pose to the format required by 4DGS. ③ run the 4DGS train.py, aka, 3K iterations of 3DGS of 0-th frame, then pridect the deformation of sequence frames.

  3. The 4DGS process and result are as follows:

Image

  1. The 3DGS result of the original frame is quite different. And, I also tested it with the original 3DGS, and the results are as follows: Image

  2. I also test the RRA/RTA/AUC between colmap and vggt in frame0: Image

In conclusion, the 3DGS result of the initial frame has a large difference, but the difference is very small when running the 3DGS source code between VGGT_BA and colmap. Where might the problem?

kk6398 avatar Jun 18 '25 01:06 kk6398

Hey @kk6398 there seems to be some confusion here, all credit to the authors for their excellent work, I am not a co-author of the paper, just an undergrad student experimenting with VGGT.

Anyways that aside your results are interesting though, but I didn't use the particular repo you have mentioned to train 4dGS or 3dGS. After obtaining the COLMAP files from VGGT, I tried to train a splatfacto via nerfstudio for 3DGS and the results were not that great, so I was wondering what you might have done differently.

Again, thanks a lot for sharing the results, I hope the original author @jytime answers your queries soon.

W-OK-E avatar Jun 18 '25 04:06 W-OK-E

Hey @kk6398 there seems to be some confusion here, all credit to the authors for their excellent work, I am not a co-author of the paper, just an undergrad student experimenting with VGGT.

Anyways that aside your results are interesting though, but I didn't use the particular repo you have mentioned to train 4dGS or 3dGS. After obtaining the COLMAP files from VGGT, I tried to train a splatfacto via nerfstudio for 3DGS and the results were not that great, so I was wondering what you might have done differently.

Again, thanks a lot for sharing the results, I hope the original author @jytime answers your queries soon.

ahh.. I just use the demo_colmap,py output the COLMAP files from VGGT, and then straightly training 3DGS. I didn't do anything else.

kk6398 avatar Jun 18 '25 04:06 kk6398

Hi, honestly I haven’t touched the 4D-GS code, so I’m probably not the best person to comment. My guess is that when they ran COLMAP on dynamic videos, they likely used masks to filter out dynamic pixels—this is standard practice for dynamic scenes. So if you’re trying to match their results, you should mask the input images accordingly before passing them to VGGT.

As for the performance on rigid scenes, it is worth noting that the default hyperparameters in demo_colmap are quite loose to ensure robustness. You may get better results by tightening them—for example, using a smaller max_reproj_error.

jytime avatar Jun 18 '25 16:06 jytime

Hey @kk6398 there seems to be some confusion here, all credit to the authors for their excellent work, I am not a co-author of the paper, just an undergrad student experimenting with VGGT. Anyways that aside your results are interesting though, but I didn't use the particular repo you have mentioned to train 4dGS or 3dGS. After obtaining the COLMAP files from VGGT, I tried to train a splatfacto via nerfstudio for 3DGS and the results were not that great, so I was wondering what you might have done differently. Again, thanks a lot for sharing the results, I hope the original author @jytime answers your queries soon.

ahh.. I just use the demo_colmap,py output the COLMAP files from VGGT, and then straightly training 3DGS. I didn't do anything else.

Hi, I'd like to know how many images are in your dataset? When I run demo_colmap.py to process the MipNeRF360 dataset, I seem to run out of memory.

Zerui-Yu avatar Jun 26 '25 08:06 Zerui-Yu

@Zerui-Yu Hey! The maximum number of images I was able to process on a 48 Gb machine was around 170 Besides also notice that in the demo_colmap file, they are loading the images at original size and feeding it to the model at 518 x 518, maybe try changing those parameters and see if you are able to accomodate the dataset.

W-OK-E avatar Jun 26 '25 09:06 W-OK-E

@W-OK-E Thanks, I'll have a try!

Zerui-Yu avatar Jun 26 '25 09:06 Zerui-Yu

Thank for your reply, and congratulations on winning the Best Paper Award!

  1. I didn't change demo_colmap.py script anywhere.
  2. I run the code of "https://github.com/hustvl/4DGaussians?tab=readme-ov-file", ① demo_colmap.py outputs the colmap file; ② “python LLFF/imgs2poses.py ./colmap_tmp/” converts the pose to the format required by 4DGS. ③ run the 4DGS train.py, aka, 3K iterations of 3DGS of 0-th frame, then pridect the deformation of sequence frames.
  3. The 4DGS process and result are as follows:

Image

  1. The 3DGS result of the original frame is quite different. And, I also tested it with the original 3DGS, and the results are as follows: Image
  2. I also test the RRA/RTA/AUC between colmap and vggt in frame0: Image

In conclusion, the 3DGS result of the initial frame has a large difference, but the difference is very small when running the 3DGS source code between VGGT_BA and colmap. Where might the problem?

Hello sir,Do you have idea about the 26psnr 3dgs result using vggt without ba?

zzz5y avatar Jul 22 '25 05:07 zzz5y

Hey @kk6398 there seems to be some confusion here, all credit to the authors for their excellent work, I am not a co-author of the paper, just an undergrad student experimenting with VGGT.

Anyways that aside your results are interesting though, but I didn't use the particular repo you have mentioned to train 4dGS or 3dGS. After obtaining the COLMAP files from VGGT, I tried to train a splatfacto via nerfstudio for 3DGS and the results were not that great, so I was wondering what you might have done differently.

Again, thanks a lot for sharing the results, I hope the original author @jytime answers your queries soon.

Hello sir ,have you get a good 3dgs result using vggt instead of colmap?

zzz5y avatar Jul 22 '25 05:07 zzz5y