vggt icon indicating copy to clipboard operation
vggt copied to clipboard

Strange behaviour in point prediction when using images with low variety of information.

Open vgutierrez2404 opened this issue 7 months ago • 2 comments

Hi all!

I'm using VGGT to find fast and reliable reconstructions on stockpiles. I'm trying to compare it to the traditional structure-from-motion algorithm. For this, I take a video around the stockpile and then use some of the frames to reconstruct. I also created a simple synthetic stockpile, which looks fake:

Image

With this synthetic stockpile, I've run the COLMAP pipeline to reconstruct i,t and I've obtained a good reconstruction:

Image

The problem arises when I try to use VGGT to fasten the reconstruction. Using the model, with 10 images around the stockpile (I've tried with 50 and I get the same result), I get bad results:

Image

My first thought was to use a bigger texture on the stockpile to enable the model to detect features on the stockpile. I did it:

Image

And with the same 10 images as before, I obtained this:

Image

Which is quite similar to the problem I had at the start.

I resized the texture again:

Image

And i obtained this:

Image

Which is now more similar to the original synthetic stockpile. I still don't get parts of the stockpile (even though I have images of that side), but at least it works.

I wondered if someone had the same problem in some of their predictions.
What could cause these wave-like ground predictions that are spreading out? When the camera is close to the stockpile, why is the model unable to predict the result?

I appreciate any help you can provide. Thanks!

vgutierrez2404 avatar May 20 '25 16:05 vgutierrez2404

Hey those noisy 3D points look like the predictions of the black background points. In our training data, we have images with pure black/white backgrounds and we do not apply supervision on them. Therefore, the model will predict almost random predictions to the black background, and give them a very low confidence score.

I think you can simply filter them out by using a higher conf thres.

jytime avatar May 20 '25 17:05 jytime

Hi, thanks for the quick response. I'm using a 70 conf_threshold for the predictions (sorry, I forgot to mention that in the first comment). I've also used higher conf_thresholds, and I got similar results.

I will update if I find something new!

Thanks.

vgutierrez2404 avatar May 20 '25 18:05 vgutierrez2404