About training supervision for depth values

Open NickHezhuolin opened this issue 11 months ago • 1 comments

Hi,

Excellent work, thanks for the detailed paper and prompt model release! I found the paper very insightful, and the results are remarkable across various datasets and tasks.

I have a question regarding the supervision signals used during training, particularly for depth values.

Whether the training process used relative depth ground truth (affine-invariant) or absolute metric depth for supervision?

Dec 31 '24 09:12 NickHezhuolin

Hi, thanks for your interest in our work, and apologies for the late response 🙏 The training data must have scale-invariant depth (i.e., rooted at 0, with no unknown shift) and calibrated camera intrinsics. This is because depth values are unprojected to 3D points for the model training, and affine-invariant depth cannot be unprojected. Absolute metrics (meters) are not required. I hope this clarifies your concern : )

Feb 22 '25 14:02 EasternJournalist