About denormalization in inference
VGGT is impressive! I just have a question: during training, "Ground Truth Coordinate Normalization" was applied. Does we need a denormalization step during inference to recover the original coordinates?
Ground Truth Coordinate Normalization We follow [129] and, first, express all quantities in the coordinate frame of the first camera g1. Then, we compute the average Euclidean distance of all 3D points in the point map P to the origin and use this scale to normalize the camera translations t, the point map P, and the depth map D.
Hi, I visit lots of issues under this repository, found that we only predict a normalized scene with normlized coordinates but not actual coordinates. Then I have another question: could I train the model without "Ground Truth Coordinate Normalization", so that I can use the actual coordinates or processed coordinates in my dataloader (without "Ground Truth Coordinate Normalization", too) directly in inference. What is the effect of this? For example, could it make the training unstable? Thanks!
hi LeryLee i had the same question, could you find an answer to it?
hi LeryLee i had the same question, could you find an answer to it?
not yet, I still try to figure out how to recover the original coordinates in the inference if we already have camera extrinsics and intrinsics. And if we can't do so, whether it is possible to train the model without "Ground Truth Coordinate Normalization".
How can we use this model to obtain the internal and external parameters of the camera?
How can we use this model to obtain the internal and external parameters of the camera?
I noticed that demo_viser.py involves the computation of camera extrinsics and intrinsics.
extrinsic, intrinsic = pose_encoding_to_extri_intri(predictions["pose_enc"], images.shape[-2:])
predictions["extrinsic"] = extrinsic
predictions["intrinsic"] = intrinsic