extri_opencv should to camera_to_world or camera_from_world
Hello
I noticed in the dataloaders you get extrinsic matrix from datasets and get extri_opencv from them by copying them. So I am guessing the extri_opencv matrix is actually camera to world pose. But in the depth_to_coordinate_points function, the code says it's camera_from_world (https://github.com/facebookresearch/vggt/blob/e56963328b7476e615ce8dda9164d381f8dc07a3/training/data/dataset_util.py#L355) So which is it actually?
Thank You
Hey extri_opencv is camera from world
@jytime Thanks for your reply. Then in the VKITTI dataloader why aren't you inversing the camera pose? You get the data directly from the dataset which has camera extrinsic (camera pose in the world coordinate frame, so camera to world) and use it as extri_opencv:
https://github.com/facebookresearch/vggt/blob/e56963328b7476e615ce8dda9164d381f8dc07a3/training/data/datasets/vkitti.py#L170
@jytime Thanks for your reply. Then in the VKITTI dataloader why aren't you inversing the camera pose? You get the data directly from the dataset which has camera extrinsic (camera pose in the world coordinate frame, so camera to world) and use it as extri_opencv:
vggt/training/data/datasets/vkitti.py
Line 170 in e569633
extri_opencv = camera_parameters[image_idx][2:].reshape(4, 4)
Hey @mmahdavian, I think this is how it works with the camera pose inversion;
-
Flattened 16-element extrinsic → reshape to 4×4 matrix.
-
Slice 4×4 → 3×4 to match dataset API and VGGT expectations.
-
Convert 3×4 → 4×4 homogeneous later for point transformation.
-
Inversion is done in the trainer, not in the dataloader.
@SaiPrasanthBL Can you explain what you mean by "inversion is done in the trainer"? I might just be missing some code snippet, but I can't find where the inversion is happening. Thanks!