Warping operation formula in MVSplat paper
Hi, could you describe the warping function that you use in the paper in more detail? I noticed that the code you provided and the corresponding formula in the paper "Unifying Flow, Stereo and Depth Estimation" do not match. Thank you very much!
Hi, @trungphien, sorry for the late reply. They should be mathematically equivalent. The main difference is that the formula is in Homogeneous coordinates, while the implementation is in Cartesian coordinates.
In particular, with reference to UniMatch Eq. (9),
where $H$ denotes the homogeneous coordinates, which represent rotation and translation as unified transformations.
While in the implementation, you can see that we perform separately the rotation (pose[:, :3, :3]) and then the translation (pose[:, :3, -1:]), as in Cartesian coordinates, as shown in
https://github.com/donydchen/mvsplat/blob/ef976a166da31e8392f2e5bee49bf66785e381e8/src/model/encoder/costvolume/depth_predictor_multiview.py#L39-L44
Note that the pose here is a relative pose to the first view, i.e., $E_2E_1^{-1}$, rather than the original extrinsic, i.e., $E_2$, see https://github.com/donydchen/mvsplat/blob/ef976a166da31e8392f2e5bee49bf66785e381e8/src/model/encoder/costvolume/depth_predictor_multiview.py#L103-L105
Let me know if it is unclear. Cheers.