vggt Scene normalization details during training

Congrats for the best paper award!🥇 I am setting up a small training pipeline starting with only Kubric data. How do you suggest Kubric scenes should be best normalized considering they have far-away background and sky areas? How did you in general deal with scenes with high depth values for far-away background or sky, in Kubric and across training datasets? Below is an example of a Kubric scene I had generated

RGB	Depth	Segmentation Masks	GT vs Pred points (for a different datapoint)

Jun 18 '25 14:06 m43

Hi, thanks! In general, we clamp excessively large depth values to zero by setting a maximum threshold—see this line for reference:

https://github.com/facebookresearch/vggt/blob/3d0427aa51af36680b3bec9aeb30a1b5a812893a/training/data/dataset_util.py#L260

Pixels with depth = 0 are subsequently ignored during both training and normalization.

Adjust the parameters based on your case:

threshold_depth_map(depth_map, max_percentile=-1, min_percentile=-1, max_depth=1024)

Jun 18 '25 16:06 jytime

Hmm, I realized that Kubric has a dome on which the floor and sky are projected and which is at a depth of around 50 Kubric units, thus the max_depth will not threshold it. When I run VGGT with Kubric images, it actually predict the dome (see prediction below), so it was probably trained without thresholding out the dome. I am wondering if I should keep the dome or lower the threshold to say 50 or less to try to remove it? And in general for other outdoor datasets, how could one know what a reasonable threshold is — how could the model know what the "reach"of the scene is (unless it is told up-front what is considered out of reach, e.g., by masking)?

Jun 18 '25 22:06 m43

I thought we used:

self.depth_max = 24
depth_map = threshold_depth_map(depth_map, 
     max_percentile=-1, min_percentile=-1, 
     max_depth = self.depth_max)

For real-world outdoor datasets, usually the points >80m are unreliable (typical for Lidar)

Jun 18 '25 22:06 jytime

Thank you! I think I will keep it at max_depth=1024 for my generated Kubric data for now since this seems to match the output of the pretrained VGGT better (because it predicts the dome) and maybe play with the parameter in the future when I have a more complete training pipeline

Jun 20 '25 20:06 m43

Hi! @m43 I am delighted to learn that you are also working with the Kubric dataset. I would like to inquire: have you used a specific subset of Kubric, or have you processed the data independently? The reason for my question is that during my own attempts, I have not found data in the Kubric dataset that meets the format requirements for VGGT.

I would be extremely grateful if you could provide a response.

Sep 11 '25 12:09 LanesraL

Hi @LanesraL! I was using the (dynamic) Multi-View Kubric from MVTracker. It is dynamic in the sense that one sequence has for example V=10 views and T=24 frames. A static dataset could be similarly generated (e.g., with V=30, T=1). What are the format requirements that would need to be met for VGGT?

Sep 11 '25 13:09 m43

@m43 Thank you so much for your reply ! My training involves the use of a camera head, a depth head, and a point head, so I need the original images along with the ground truth for these three components.

Sep 12 '25 02:09 LanesraL

Kubric can provide that data (rgbs, depth maps, intrinsics, extrinsics), but I think that you might need to write your own Kubric script/worker to generate the data you like. E.g., the one I shared above is for dynamic data, but could be adapted for static data generation as well. I am not sure what Kubric script/worker was used to generate the Kubric data used during the training of VGGT

Sep 13 '25 17:09 m43

Kubric can provide that data (rgbs, depth maps, intrinsics, extrinsics), but I think that you might need to write your own Kubric script/worker to generate the data you like. E.g., the one I shared above is for dynamic data, but could be adapted for static data generation as well. I am not sure what Kubric script/worker was used to generate the Kubric data used during the training of VGGT

@m43 Thank you so much! That helps a lot! @jytime Could you reveal some information about Kubric's generation strategy for training data? Thanks!

Sep 16 '25 06:09 LanesraL