on VGGT with RGB-D depth maps (apparent translation in pose)

Open gbenga007 opened this issue 7 months ago • 0 comments

Dear Authors,

Thank you for the excellent work and the great presentation at CVPR. I’m evaluating VGGT for camera pose estimation in a low-overlap setting. With RGB images from my RGB-D sensor, the recovered poses look correct: RGB result (looks good, see screenshot):

However, when I use the aligned depth maps from the same sensor (registered to the color camera), I observe an apparent translation/shift in the reconstructions: Depth-based result (shows the translation effect I'm talking about, see screenshot):

Do you have any recommendations on what might cause this and how to address it? For context, the depth camera is factory-registered to the color camera. I use the color intrinsics for both (depth and color) and the extrinsics are identity between aligned depth and color.

I’d be grateful for pointers. Thanks again for the great work !

Sep 02 '25 19:09 gbenga007