Warping operation formula in MVSplat paper

Open trungphien opened this issue 1 year ago • 1 comments

Hi, could you describe the warping function that you use in the paper in more detail? I noticed that the code you provided and the corresponding formula in the paper "Unifying Flow, Stereo and Depth Estimation" do not match. Thank you very much!

Nov 29 '24 10:11 trungphien

Hi, @trungphien, sorry for the late reply. They should be mathematically equivalent. The main difference is that the formula is in Homogeneous coordinates, while the implementation is in Cartesian coordinates.

In particular, with reference to UniMatch Eq. (9), where $H$ denotes the homogeneous coordinates, which represent rotation and translation as unified transformations.

While in the implementation, you can see that we perform separately the rotation (pose[:, :3, :3]) and then the translation (pose[:, :3, -1:]), as in Cartesian coordinates, as shown in

https://github.com/donydchen/mvsplat/blob/ef976a166da31e8392f2e5bee49bf66785e381e8/src/model/encoder/costvolume/depth_predictor_multiview.py#L39-L44

Note that the pose here is a relative pose to the first view, i.e., $E_2E_1^{-1}$, rather than the original extrinsic, i.e., $E_2$, see https://github.com/donydchen/mvsplat/blob/ef976a166da31e8392f2e5bee49bf66785e381e8/src/model/encoder/costvolume/depth_predictor_multiview.py#L103-L105

Let me know if it is unclear. Cheers.

Dec 19 '24 00:12 donydchen