vggt The Squashing when images are larger than 518

I am seeing the squashing, as reported also in https://github.com/facebookresearch/vggt/issues/32 See the images below. size is 588 height and 1036 width. The depth maps appear accurate (world_points and world_points_from_depth as well). But the pointcloud is squashed, as you see below. I am not sure what is at play, here is what I considered

The network loads at 518 while the image is 588? So is the 588 dimensions squashed to 518?
Some compression in the x and y in the positional encoding?

I hope the authors can clarify.

Mar 20 '25 18:03 yaseryacoob

Hi, I haven’t experimented with this exact case yet, so I’ll need to run it myself to investigate further. Could you specify whether your 3D points come directly from the point map head, or from the depth + camera head combination? If it’s the latter (depth + camera head), it’s likely that the depth predictions are correct, but the intrinsic parameters are distorted. A quick way to verify this would be manually setting fx = fy and visually checking if the issue persists. This is just a temporary suggestion until I can look into it more thoroughly.

Mar 21 '25 17:03 jytime

I encountered a similar problem.

I have tried a series of images in the shape of 518x518, and the result of VGGT is perfect. However, when I downsample the images to 336x336, a significant difference exists between the predicted fx and fy, making the reconstructed point clouds (from depth head) warped.

Mar 22 '25 07:03 ShunyuanZheng

Hi @ShunyuanZheng can you try first resizing to 336x336 and then black padding it to 336x518?

Mar 22 '25 14:03 jytime

Hi @ShunyuanZheng can you try first resizing to 336x336 and then black padding it to 336x518?

Under this implementation, the result looks reasonable without distortion though its performance is slightly degraded compared with that takes 518x518 images as input.

Mar 23 '25 07:03 ShunyuanZheng