mvsplat360 icon indicating copy to clipboard operation
mvsplat360 copied to clipboard

Question about Camera Conventions

Open YeZhang0621 opened this issue 1 year ago • 1 comments

Hi! Thanks for the great work!

Recently, I'm trying to run MVSplat360 on DNA-Rendering dataset. Some Questions below:

  1. The camera extrinsic params stored in preprocessed torch files should be OpenCV-style World-to-Camera or OpenCV-style Camera-to-World? The README of project says camera-to-world, but in the convert_dl3dv.py I saw these code:
for frame in meta_data["frames"]:
        timestamps.append(
            int(os.path.basename(frame["file_path"]).split(".")[0].split("_")[-1])
        )
        camera = [saved_fx, saved_fy, saved_cx, saved_cy, 0.0, 0.0]
        # transform_matrix is in blender c2w, while we need to store opencv w2c matrix here
        opencv_c2w = np.array(frame["transform_matrix"]) @ blender2opencv
        opencv_c2ws.append(opencv_c2w)
        camera.extend(np.linalg.inv(opencv_c2w)[:3].flatten().tolist())
        cameras.append(np.array(camera))

the code inversed the c2w matrix, and stored it into torch file, which made me confused.

  1. How to verify whether the camera parameters are aligned with the codebase correctly? My way is to try different transformation to the camera parameters provided by DNA-Rendering, such as flipping y and z axis, c2w to w2c, ... , and also combination of them. However, NONE of them work. Here are some results running on DNA-Rendering dataset(I used camera parameters from DNA-Rendering, non-flipping, and changed to w2c): 2 13 26

Left is Groud Truth, Middle is GSplat, Right is Refined Image.

And here are the visualiaztions of epipolar lines: 0008_01_new_00 0008_01_new_01 0008_01_new_02

Could you help me check whether these results are normal? Am I using the right camera conventions? If not, any idea what could be wrong? Thanks !!!

YeZhang0621 avatar Jan 06 '25 08:01 YeZhang0621

Hi, @YeZhang0621, thanks for your appreciation.

Sorry for the confusion caused. The reason why we inverse it is because we assumed that w2c is stored in the torch file in the dataloader (mainly because we directly modified the dataloader from re10k), as shown at https://github.com/donydchen/mvsplat360/blob/986a9917e6ddb103f5c6370890d42da65bf89d9e/src/dataset/dataset_dl3dv.py#L347 In other words, w2c in the torch file and c2w in the code.

Ideally, the epipolar line in the target image should pass through the same point in the source view. The problem is that your example contains a small overlap region, making it difficult to find the points that exist on both views. Try debugging with those nearest views that contain a larger overlap. Cheers.

donydchen avatar Jan 07 '25 04:01 donydchen