vggt extri_opencv should to camera_to_world or camera_from

Hello

I noticed in the dataloaders you get extrinsic matrix from datasets and get extri_opencv from them by copying them. So I am guessing the extri_opencv matrix is actually camera to world pose. But in the depth_to_coordinate_points function, the code says it's camera_from_world (https://github.com/facebookresearch/vggt/blob/e56963328b7476e615ce8dda9164d381f8dc07a3/training/data/dataset_util.py#L355) So which is it actually?

Thank You

Sep 08 '25 22:09 mmahdavian

Hey extri_opencv is camera from world

Sep 10 '25 15:09 jytime

@jytime Thanks for your reply. Then in the VKITTI dataloader why aren't you inversing the camera pose? You get the data directly from the dataset which has camera extrinsic (camera pose in the world coordinate frame, so camera to world) and use it as extri_opencv:

https://github.com/facebookresearch/vggt/blob/e56963328b7476e615ce8dda9164d381f8dc07a3/training/data/datasets/vkitti.py#L170

Sep 10 '25 15:09 mmahdavian

@jytime Thanks for your reply. Then in the VKITTI dataloader why aren't you inversing the camera pose? You get the data directly from the dataset which has camera extrinsic (camera pose in the world coordinate frame, so camera to world) and use it as extri_opencv:

vggt/training/data/datasets/vkitti.py

Line 170 in e569633

extri_opencv = camera_parameters[image_idx][2:].reshape(4, 4)

Hey @mmahdavian, I think this is how it works with the camera pose inversion;

Flattened 16-element extrinsic → reshape to 4×4 matrix.
Slice 4×4 → 3×4 to match dataset API and VGGT expectations.
Convert 3×4 → 4×4 homogeneous later for point transformation.
Inversion is done in the trainer, not in the dataloader.

Sep 12 '25 21:09 SaiPrasanthBL

@SaiPrasanthBL Can you explain what you mean by "inversion is done in the trainer"? I might just be missing some code snippet, but I can't find where the inversion is happening. Thanks!

Oct 08 '25 20:10 cynxcao

extri_opencv should to camera_to_world or camera_from_world